DNA storage: All the world's data in a teaspoon
The world's data storage requirements are expanding exponentially and at the current rate we will soon run out of the materials that enable traditional storage technologies. DNA has the potential to provide a highly efficient solution to our storage needs but is held back by dated approaches to reading and writing which are many orders of magnitude slower than that achieved by nature. Solving this challenge could have the potential to create a near infinite storage mechanism operating at almost zero energy and nominal cost.
Why focus on DNA storage?
The global data archive, including everything from scientific work to YouTube videos will hit 44 trillion gigabytes in 2020. That's a problem because if we stored everything on instant access chips (the current trend) we would need around 100 times the current supply of silicon. Of course not all data storage requires instant access but even storing just one exabyte (one billion gigabytes) on tape costs $1bn over 10 years as well as hundreds of megawatts of power.

Molecular data storage has some incredible benefits over traditional storage: It's ludicrously dense (you can store one bit per base and one base is just a few atoms compared to 100,000 atoms on a hard drive). It's incredibly stable, whereas most new storage methods are sensitive to the slightest disturbance, DNA can survive for hundreds of thousands of years at room temperature. These attributes hold the potential to reduce the space and power required to store the world's data by over three orders of magnitude.
What are the opportunities?
Ever since Ewan Birney and Nick Goldman of the EBI first coined the idea of DNA storage to solve their genetics data storage headaches huge amounts of funding have flooded into the field and the amount of data written to DNA has increased exponentially. The first experiments encoded around 5 Mb, the record to date is 5.5 petabytes in a single gram by a team of Harvard researchers in 2012 and this year Microsoft generated a lot of media attention for storing 200 Mb (keep up Microsoft).

Writing and reading large amounts of DNA is clearly possible but what's holding back the field is the speed and the cost. The experiments listed above took days and cost $1000s per Mb, that's beyond the bounds of usefulness for almost any application.

Writing DNA

The dominant method of writing strings of nucleotides is a 30-year old organic chemistry process that takes over 400 second per base. To put that into context the bacteria e.coli is able to churn out 4000 base pairs per second and solid state drives read millions of bits per second. Perhaps traditional methods could be made viable if parallelised billions of times and implemented with heavy software based error correction but is unlikely to be the most elegant way to reach scale. Several companies have explored more biologically relevant methods of synthesising DNA including our friends at Touchlight Genetics and others are looking at yet to be revealed silicon based platforms including Twist Bioscience. However, these companies are focused on producing defined sequences at scale rather than ability to write single, dynamic sequences at speed.

Reading DNA

Reading data from DNA is equally challenging. The human genome project has lead to a 3x order of magnitude cost reductions in traditional primer / fluorescence based methods and technologies like NanoPore have enabled read speeds of up to 250 bases per second. This is still two orders of magnitude away from silicon based memory. A big part of the problem is needing to read the entire string to find the bit that is needed (much like tape storage in the old days) and many groups are looking at ways around this, one of the most cutting edge approaches is that of Olgica Milenkovic who uses a CRISPR-Cas9 to conduct a random access approach.

DNA storage is a rapidly moving research field, however in the main the approaches are fairly traditional with clever tweaks to make them more effective and largely completely separated between reading and writing. We are looking for a completely novel and elegant and combined approach that has the potential to enable read and write oligonucleotide strings at a rate of millions of bases per second at near nominal cost.
Who are we looking for?
We think this challenge would benefit from people with backgrounds in:

  • Bioinformatics
  • Genetics
  • Synthetic biology
  • Organic chemistry
  • Computer science
  • Electronic engineering
  • Chemical biology

If you have a different STEM background, but you're keen to solve problems in this challenge area, please apply, the most interesting things happen at the interface between skill-sets!
Specific challenges
We're currently designing a number of specific challenges in this area.
Sign up if you'd like to work on this challenge area and find out more about the specific challenges!
How to fit all the world's data in a teaspoon? Can you solve The Frontier challenge?
Other challenge areas

Challenges have been developed in collaboration with Science Practice, design & research company working with scientists.
Made on