Consider the amount of data that we are generating every second while commuting from our home to our offices. Data is constantly generated from trackers in our smart watch, from our smartphones, from taxi ride, from social networking, from offices. According to some estimates, this would be close to 44 zettabytes (1 zettabyte = 1 trillion gigabytes) in 2020. No doubt companies want to set up new data centres at various location. A data centre is the engine of the internet. In simpler terms, it is a huge facility (for example, last year Facebook secured a perimeter of close to 487000 square-foot for the data centre in Prineville, US) which requires a lot of power, continuous water supply and a lot of computers. The story does not end here, maintaining a data centre requires a holistic approach, the facility should be away from vulnerable zones; a continuous water supply for cooling, environmental health and safety considerations have to be maintained, sustainability of the system and carbon offset management are other important key factors that have to be balanced. Moreover, a data centre stores data on magnetic strips, which have a life of 10 to 30 years, such magnetic strips are replaced continuously to ensure smooth access to stored data, this creates the problem of e-waste management. So to preserve world’s data we need significant advancement in storage technology, which has high archival density, require less space, is environmentally sustainable and durable. Nature itself has solved this problem millions of years ago. Today DNA (Deoxyribonucleic acid) as we know have a high data density (10^9 bits /cubic cm), high durability (half-life of more than 100 years) and is also environmentally efficient (power usage is 10^-10 watts/ gigabytes).
For decades scientists have been looking to replace silicon for data archiving and DNA is the probable candidate. But using DNA to store data has its own limitations from encoding data into DNA to retrieving specifically that information which is required by the user; in a process similar to Random Access. The process of encoding nucleotides (basic structural unit of DNA) into a string is extremely slow if we want DNA to store our data we need to improve on this significantly. Reading or retrieving such encoded information is done through sequencing of DNA strands. Sequencing itself is a slow process and it is difficult to retrieve a piece of information without sequencing the complete strand of DNA. Even if we reach a level where we can speed up the process of DNA encoding and sequencing it is highly probable that such technique would not be free for industrial use and the holder of the patent for such technique will have monopoly over the market; thus increasing the cost of storing data into DNA to a significant level. Also, there is a chance that of the data breach during the process itself as data has to be given to scientists who will encode and verify that data has been encoded properly before archiving. Verification process that will ensure that the data has been encoded properly will also increase the cost of the process and moreover all of this need to be done in real time because users might want to retrieve and store data at the same time.
Good news is the proof-of-concept experiment has already been conducted and was successful. According to some scientists, there is a need to improve the current technology by 100000 folds to make it viable, which in genomics in not a big deal. The way forward can be to store data which is not retrieved on a daily basis and is extremely crucial like coordinates of locations where nuclear waste has been stored. Archiving of historical records like Vedas, ancient philosophical texts, scientific advancements and paintings is also a good idea. Reaching a level where DNA data storage technology is accessible to all would require extensive research and a huge amount of money. Clearly, developing such technology would require at least 25 years before we can use it to store data at our will.