Life’s Code: Blockchain and the Future of Genomics

In an era of hotly contested debates surrounding data ownership, privacy and monetization, one particular piece of data could be said to be the most personal of all: the human genome.

While we are 99.9 percent identical in our genetic makeup across the species, the remaining 0.1 percent contains unique variations in code that are thought to influence our predisposition toward certain diseases and even our temperamental biases — a blueprint for how susceptible we are to everything from heart disease and Alzheimer’s to jealousy, recklessness and anxiety.

2018 offered ample examples of how bad actors can wreak havoc with nefarious use of even relatively trivial data. For those concerned to protect this most critical form of identity, blockchainhas piqued considerable interest as a powerful alternative to the closed architectures and proprietary exploits of the existing genomics data market — promising in their stead a secure and open protocol for life’s code.

Encrypted chains

Sequencing the human genome down to the molecular level of the four ‘letters’ that bind into the double-stranded helices of our DNA was first completed in 2003. The project cost $3.7 billion and 13 years of computing power. Today, it costs $1,000 per unique genome and takes a matter of days. Estimates are that it will soon cost as little as $100.

As genomic data-driven drug design and targeted therapies evolve, pharmaceutical and biotech companies’ interest is expected to catapult the genomics data market in the coming years, with a forecast to hit $27.6 billion by 2025.

If the dataset of your Facebook likes and news feed stupefactions has already been recognized as a major, monetizable asset, the value locked up in your genetic code is increasing exponentially as the revolution in precision medicine and gene editing gathers pace.

Within the past year, unprecedented approvals have been given to new gene therapies in the U.S. One edits cells from a patient’s immune system to cure non-Hodgkin lymphoma; another treats a rare, inherited retinal disease that can lead to blindness.

Yet, here’s the rub.

Genomics’ unparalleled potential to trigger a paradigm shift in modern medicine relies on leveraging vast datasets to establish correlations between genetic variants and traits.

Generating the explosion of big genomic data that is still needed to decode the 4-bits of the living organism faces hurdles that are not only scientific, but ethical, social and technological.

For many at the edge of this frontier, this is exactly where Nakamoto’s fabled 2008 white paper — and the technology that would come to be known as blockchain — comes in.

Who owns your genome? Resurrecting the wooly mammoth… and blockchain

For Professor George Church, the world-famous maverick geneticist at Harvard, the boundaries between technologies in and out of the lab are porous. Having co-pioneered direct genome sequencing back in 1984, a short digest of his recent ambitions include attempts to resurrect the long-extinct mammoth, create virus-proof cells and even to reverse aging.

He has now placed another bleeding-edge technology at the center of the genomics revolution: blockchain.

Last year, Church — alongside Harvard colleagues Dennis Grishin and Kamal Obbad — co-founded the blockchain startup Nebula Genomics. Church had been trying for years to accelerate and drive genomic data generation at scale. He had appealed to volunteers to contribute to his nonprofit Personal Genome Project (PGP) — a ‘Wikipedia’ of open-access human genomic data that has aggregated around 10,000 samples so far.

PGP relied on people forfeiting both privacy and ownership in pursuit of advancing science. As Church said in a recent interview, mostly they were either the “particularly altruistic,” or people concerned with accelerating research for a particular disease because of family experiences.

In other cases, as cybersecurity expert DNABits’ Dror Sam Brama told Cointelegraph, it is the patients themselves who generate the data and are “sick enough to throw away any ownership and privacy concerns”:

“The very sick come to the health care system and say, ‘We’ll give you anything you want, take it, we’ll sign any paper, consent. Just heal us, find a cure.’”

The challenge is getting everybody else. While no one knows exactly how many people have had their genomes sequenced to date, some estimates suggest it is around one million.

Startups like Nebula and DNABits propose that a tokenized, blockchain-enabled ecosystem could be the technological tipping point for onboarding the masses.

By allowing people to monetize their genomes and sell access directly to data buyers, Nebula thinks its platform could help drive sequencing costs down “to zero or even offer [people] a net profit.”

While Nebula won’t subsidize whole genome sequencing directly, a blockchain model would allow interested buyers — say, two pharmaceutical companies — to pitch in the cash for someone’s sequence in return for access to their data.

Tokenization opens up the flexibility and granular consent for enabling different scenarios. As Brama suggested, a data owner could be entitled to shares in whichever drug might be developed based on the research that they have enabled or be reimbursed for their medical prescription in crypto tokens. Contracts would be published and hashed, and reference to the individual’s consent recorded on the blockchain.

Genomic dystopias

Driving and accelerating data generation is just one part of the equation.

Nebula ran a survey that found that, rather than simply affordability, privacy and ethical concerns eclipsed all other factors when people were asked whether or not they would consider having their genome sequenced. In another study of 13,000 people, 86 percent said they worried about misuse of their genetic data: over half echoed fears about privacy.

These concerns are not simply founded in the dystopian 90s sci-fi of Hollywood — think Gattaca’s biopunk imaginary of a future society in the grips of a neo-eugenics fever.

As Ofer Lidsky — co-founder, CEO and CTO of blockchain genomics startup DNAtix — put it:

“Once your DNA has been compromised, you cannot change it. It’s not like a credit card that you can cancel and receive a new one. Your genetic code is with you for all your life […] Once it’s been compromised, there’s no way back.”

Data is increasingly intercepted, marketized and even weaponized. Sequencing — let alone sharing — your genome is perhaps a step further than many are willing to take, given its singularity, irrevocability and longevity.

DNABits’ Brama gave his cybersecurity take, saying that:

“The consequences are very difficult to imagine, but in a world [in which] people are building carriers like viruses that will spread to cells in the body and edit them — it’s frightening, but in fact, all the building blocks are already there: genome sequencing, breaches of data, gene editing. People are now working to fix major health conditions using gene editing in vivo. But we should assume that every tool out there will eventually also get into the wrong hands.”

He added, “We’re not talking about taking advantage of someone just for one night with GHB or some other drug” — this would impact the rest of an individual’s life.

This April, on the heels of the Cambridge Analytica scandal, news broke that police detectives had mined a hobbyists’ genealogy database for fragments of individuals’ DNA they hoped would help solve a murder case that had gone cold for over thirty years.

Law enforcement faced no resistance in accessing a centralized store of genetic material that had been uploaded by an unwitting public. And while many hailed the arrest of the Golden State Killer through a tangle of DNA, others voiced considerable unease.

This obscurity of access has implications beyond forensics. While Brama’s dystopia may be some way off, today there are concerns about genetic discrimination by employers and insurance firms — the latter of which is currently only legally proscribed in a partial way. Grishin echoed this, noting that in the U.S., “you can be denied life insurance because of your DNA.”

This May, the U.S. Federal Trade Commission opened a probe into popular consumer genetic testing firms — including 23andMe and Ancestry.com — over their policies for handling personal and genetic information, and how they share that data with third parties.

23andMe and Ancestry.com represent a recent phenomenon of so-called direct-to-consumer genetic testing, the popularity of which is estimated to have more than doubled last year.

These firms use a narrower technique called genotyping, which identifies 600,000 positions spaced at approximately regular intervals across the 6.4 billion letters of an entire genome. While limited, it still reveals inherently personal genetic information.

The highly popular 23andMe home genotyping kit — sunnily packaged as “Welcome to You” — promises to tell people everything from their ancestral makeup to how likely they are to spend their nights in the fretful clutches of insomnia. The kit comes with a price tag as low as $99.

This July, the world’s sixth largest pharmaceutical company, GlaxoSmithKline (GSK), invested $300 million in a four-year deal to gain access to 23andMe’s database, and the testing firm is estimatedto have earned $130 million from selling access to around a million human genotypes, working out at an average price of around $130. By comparison, Facebook reportedly generates around $82 in gross revenue from the data of a single active user.

Battle-proof, anonymized blockchain systems for the genomics revolution

In this increasingly opaque genomics data landscape, private firms monetize the genotypic data spawned by their consumers, and sequence data is fragmented across proprietary, centralized silos — whether in the unwieldy legacy systems of health care and research institutions or in the privately-owned troves of biotech firms.

Bringing genomics onto the blockchain would allow for the circulation that is needed to accelerate research, while protecting this uniquely personal information by keeping anonymized identities separate from cryptographic identifiers. Users remain in control of their data and decide exactly who it gets shared with and for which purposes. That access, in turn, would be tracked on an auditable and immutable ledger.

Grishin outlined Nebula’s version, which would place asymmetric requirements on different members of the ecosystem. Users would have the option to remain anonymous, but a permissioned blockchain system with verified, validator nodes would require data buyers who use the network to be fully transparent about their identity:

“If someone reaches out to you, it shouldn’t be just a cryptographic network ID, but it should say this is John Smith from Johnson & Johnson, who works, say, in oncology.”

Grishin added that Nebula has experimented with both Blockstack and the Ethereum (ETH) blockchain but has since decided to move to an in-house prototype, considering the 15 transactions-per-second capacity of Ethereum to be insufficient for its ecosystem.

DNABits’ Brama, also committed to using a permissioned system, proposed using “the simplest and most robust form of blockchain — i.e., a Bitcoin-type network.”

“The more powerful and the more capable engine that you use, the larger the surface attack.”

Lie-proofing the blockchain

23andMe is said to store around five million genotype customer profiles, and rival firm Ancestry.com around 10 million. For each profile, they collect around 300 phenotypic data points — creating surveys that aim to find out how many cigarettes you (think) you’ve smoked during your lifetime or whether yoga or Prozac was more effective in managing your depression.

A phenotype is the set of observable characteristics of an individual that results from the interaction of his or her genotype with their environment. Generating and sharing access to this data is crucial for decoding the genome through a correlation of variants and traits. But as Grishin notes, being largely self-reported, the quality of much of the existing data is uncertain, and a tokenized genomics faces one hurdle in this respect:

“If people will be able to monetize their personal genomic data, then you can imagine that some people might think, ‘If I claim to have a rare condition, many pharma companies will be interested in buying access to my genome’ — which is just not necessarily true. The value of a genome is kind of difficult to predict and it’s not correct to say that if you have something rare, then your genome will be more valuable. In fact many studies need a lot of control samples that are kind of just normal.”

Education can help make people aware that they won’t be making any more money by lying and that a middle-of-the-road genome might be just as interesting for a buyer as an unusual one. But Grishin also noted that a blockchain system can offer unique mechanisms that deter deception, even where education fails:

“Blockchain can help to create phenotype surveys that detect incorrect responses or identify where an individual participant has tried to lie. And this can be combined with blockchain-enabled escrow systems, where, for example, before you participate in a survey, you have to deposit a small amount of cryptocurrency in an escrow wallet.”

If conflicting responses indicate that someone has tried to lie about their medical condition, then their deposit could be withheld in a way that is much easier to implement within a blockchain system than compared to one using fiat currencies.

2018: Viruses and chromosomes hit the blockchain

Even with just a fraction of the population on board, given the data-intensivity of the body’s code, a tsunami of sequence is already flooding the existing centralized stores.

The complex, raw dataset of a single genome runs to 200 gigabytes: In June 2017, the U.S. National Institute of Health’s GenBank reportedly contained over two trillion bases of sequence. One of the world’s largest biotech firms, China’s BGI Genomics, announced that same month that it planned to produce five petabases of new DNA in 2017, increasing each year to hit 100 petabases by 2020.

In his interview with Cointelegraph, Lidsky proposed that the raw 200 gigabyte dataset is unnecessary for analysts, emphasizing that initial genome sequencing is read multiple times “say 30 or 100 times,” to mitigate inaccuracies. Once it’s combined, he explained, “the size of the sequence is reduced to 1.5 gigabytes.” This still requires significant compression to bring it to the blockchain. As a reference, the average size of a transaction on the Bitcoin (BTC) blockchain was 423 kilobytes, as of mid-June 2018.

In June, DNAtix announced the first transfer of a complete chromosome using blockchain technology — specifically IBM’s Hyperledger fabric. Lidsky told Cointelegraph the firm had succeeded in achieving a 99 percent compression rate for DNA this August.

Nebula, for its part, envisions that even on a blockchain, data transfer is unnecessary and ill-advised, given the unique sensitivity of genomics. It proposes sharing data access instead. The solution would combine blockchain with advanced encryption techniques and distributed computing methods. As Grishin outlined:

“Your data can be analyzed locally on your computer by you just running an app on your data yourself […] with additional security measures in place — for example, by using homomorphic encryption to share data in an encrypted form.”

Grishin explained that homomorphic techniques encrypt data but ensure that it is not “nonsensical” — creating “transformations that morph the data without disturbing it”:

“The data buyer doesn’t get the underlying data itself but computes on its encrypted form to derive results from it. Code is therefore being moved to the data rather than data being moved to researchers.”

Encrypted data can be made available to developers of so-called genomic apps — something that Nebula, DNAtix and many other emerging startups in the field all propose as one means of providing users with an interpretation of their data. They could also provide a further source of monetization for researchers and other third-party developers.

But is ‘outsourcing’ genomic interpretation to an app that simple? The decades-old health care model referred patients to genetic counselors to go over risks and talk through expectations, helping to translate what can be bewildering and often scary results.

Consumer genetic testing firms have already been accused of leaving their clients “with lots of data and few answers.” Beyond satisfying genealogical curiosity and interpreting a range of ‘wellness’ genes, 23andMe can reveal whether you carry a genetic variant that could impact your child’s future health and has — as of 2017 — even been authorized to disclose genetic health risks, including for breast cancer and Parkinson’s.

Blockchain may not fare much better when it comes to leaving individuals in the dark, faced with the blue glow of their computer screens. Nebula and DNAtix are both considering how to integrate genetic counselors into their ecosystems, and Grishin also proposed that users would be able to “opt in” to whether they really want to “know everything,” or only want “actionable” insights — i.e., things that modern medicine can address.

Blockchain and big pharma

Prescription drug sales globally are forecast to hit $1.2 trillion by 2024. But closing the feedback loop between pharmaceuticals and the millions of people who take their pills each and every day still faces significant hurdles.

Drug development relies on correlating and tracking the life-cycle of medical trials, genetic testing, prescription side effects and longer-term effects relating to lifestyle; tokenization can incentivize individuals and enterprises to share data that is generated across multiple streams. As Brama outlined:

“Lifestyle data comes from wearables, smartphones, smart homes, smart cities, purchasing, commercial interactions, social media, etc. Another is carried by everyone, and that’s our genome. The third is clinical and health-condition data generated in the health care system.”

Brama used the analogy of a deck of cards to explain how blockchain could be the key to starting to bring this data into connection, all the while protecting data owners’ anonymity.

An individual can hold an unlimited number of unique addresses in their digital wallet. Going into a pharmacy to purchase a particular drug — say, vitamin C, stamped with a QR code — would generate a transaction for one of these addresses. A visit to a family doctor might generate a further hash for a diagnosis on your electronic medical record (EMR) — say, a runny nose. This transaction goes between the caregiver and another wallet address.

A user might choose to put the correlation between transactions for their different wallets on the blockchain and make it public for people to bid on the underlying data. Or, they might keep the correlation off-chain and send proof only when, say, an insurance firm or research institute advertises to users who have a particular set of transactions:

“You hold the deck. You look at the cards, you decide if you say, if you don’t say. And you can put them on the table and let everyone see, or you can indicate privately that you actually have these. It really leaves the choice and the implementation up to you.”

Biotechnological frontiers

Professor Church has made an analogy that likely rings bells for anyone plugged into the crypto and blockchain space, saying that “right now, genome sequencing is like the internet back in the late 1980s. It was there, but no one was using it.”

Blockchain and the vanguard of genomic research have perhaps come closer to each other than ever before. Now that the DNA in our cells is understood as a life-long store of information, a new and disruptive technology is needed to securely and flexibly manage the interlocking networks of the body’s code.

The advent of genomics raises questions that cannot be settled by science alone. For all of our interviewees, blockchain could be just the key to creating the equitable and transparent means of ownership and circulation that would ensure these helices of raw biomaterial don’t spiral out of control.