Paul van Gerven
20 October

Several tech companies are looking into using DNA for archival storage purposes.

DNA as a data storage medium – the concept is almost intuitive. If nature entrusts the biopolymer with the information needed to construct an organism, why couldn’t it store our digital data? DNA uses four different molecules to encode information, which is easily adapted to a binary system (see inset). Perhaps unsurprisingly, therefore, the idea was already put forward decades ago. Physicist Richard Feynman proposed it in his famous 1959 lecture “There’s plenty of room at the bottom,” only six years after the helical structure of DNA was discovered.

Storing data in DNA would have some major advantages. As small as a flash memory cell is, a single molecule as a basic storage unit is about as small as it gets. Theoretically capable of storing 455 exabytes per gram, a soda can full of DNA could store all of the world’s data. At the current growth rate of data generation, such high-density solutions will become attractive soon enough – if not inescapable, proponents say.

Another strong suit of DNA is its high stability when properly conserved. Like most biomolecules, without protection DNA is fragile and prone to degradation. But the fact that fully-intact DNA has been extracted from ancient fossils proves that there are ways to preserve it for thousands of years, if not indefinitely.

By comparison, flash devices typically have data retention times of about 10 years, while magnetic tape – still the go-to solution for long-term digital storage – has a lifespan of 10-20 years. The robustness of DNA storage solutions is a particularly attractive feature for organizations that need to store large amounts of data that’s accessed only rarely, such as movie studios and national archives.

Add to this a low environmental impact in terms of energy and raw material needs, and you can see why companies would be interested in using DNA for digital data storage. And they are.

Slow and costly

Once automated technology to read (ie sequence) and write (ie synthesize) DNA became available, DNA storage turned into a viable option. Probably the first convincing demonstration dates back to 2012, when Harvard researchers encoded an entire biology book in DNA, including images, formatting information and some Javascript code. In this study, 5.27 megabits of data were stored – far beyond the data size seen in previous efforts.

Such demonstrations are obviously still a far cry from practical applications; even today, DNA sequencing and synthesis techniques are too slow and costly. Nonetheless, data-intensive companies are taking the concept very seriously now. Members of the DNA Data Storage Alliance (DDSA), founded in 2020, include software maker Microsoft, hard disk manufacturers Western Digital and Seagate, and ICT companies Dell, IBM and Lenovo.

200 megabytes

Anticipating further reductions in the cost of synthesis and sequencing, the DDSA maintains that DNA data storage will become a cost-competitive solution. Part of the savings, in the case of archival storage, would come from eliminating the need of having to routinely transfer data to a new storage medium to prevent data loss due to material deterioration. Another potential cost-saving feature would be the ability to make massive amounts of copies in parallel once the data has been encoded in DNA strands.

The Intelligence Advanced Research Project Activity (IARPA), the R&D arm of US intelligence agencies, is running a program aiming to write 1 terabyte and read 10 terabytes within 24 hours for 1,000 dollars. Just to get a feel for where the field is now, the Georgia Tech Research Institute holds the current record for writing at 200 megabytes per day, though the researchers have recently claimed they can speed that up 100-fold.