Ever since Ewan Birney and Nick Goldman of the EBI first coined the idea of DNA storage to solve their genetics data storage headaches huge amounts of funding have flooded into the field and the amount of data written to DNA has increased exponentially. The first experiments encoded around 5 Mb
, the record to date is 5.5 petabytes in a single gram by a team of Harvard researchers
in 2012 and this year Microsoft generated a lot of media attention for storing 200 Mb
(keep up Microsoft).
Writing and reading large amounts of DNA is clearly possible but what's holding back the field is the speed and the cost. The experiments listed above took days
and cost $1000s per Mb, that's beyond the bounds of usefulness for almost any application. Writing DNA
The dominant method of writing strings of nucleotides is a 30-year old organic chemistry process
that takes over 400 second per base. To put that into context the bacteria e.coli is able to churn out 4000 base pairs per second and solid state drives read millions of bits per second. Perhaps traditional methods could be made viable if parallelised billions of times and implemented with heavy software based error correction but is unlikely to be the most elegant way to reach scale. Several companies have explored more biologically relevant methods of synthesising DNA including our friends at Touchlight Genetics
and others are looking at yet to be revealed silicon based platforms including Twist Bioscience
. However, these companies are focused on producing defined sequences at scale rather than ability to write single, dynamic sequences at speed. Reading DNA
Reading data from DNA is equally challenging. The human genome project has lead to a 3x order of magnitude cost reductions in traditional primer / fluorescence
based methods and technologies like NanoPore
have enabled read speeds of up to 250 bases per second. This is still two orders of magnitude away from silicon based memory. A big part of the problem is needing to read the entire string to find the bit that is needed (much like tape storage in the old days) and many groups are looking at ways around this, one of the most cutting edge approaches is that of Olgica Milenkovic who uses a CRISPR-Cas9 to conduct a random access approach
DNA storage is a rapidly moving research field, however in the main the approaches are fairly traditional with clever tweaks to make them more effective and largely completely separated between reading and writing. We are looking for a completely novel and elegant and combined approach that has the potential to enable read and write oligonucleotide strings at a rate of millions of bases per second at near nominal cost.