Veritas Genetics Scoops Up an AI Company to Sort Out Its DNA
Genes carry the information that make you you. So it's fitting that, when sequenced and stored in a computer, your genome takes up gobs of memory—up to 150 gigabytes. Multiply that across all the people who have gotten sequenced, and you're looking at some serious storage issues. If that's not enough, mining those genomes for useful insight means comparing them all to each other, to medical histories, and to the millions of scientific papers about genetics.
Sorting all that out is a perfect task for artificial intelligence. And plenty of AI startups have bent their efforts in that direction. On August 3, sequencing company Veritas Genetics bought one of the most influential: seven-year old Curoverse. Veritas thinks AI will help interpret the genetic risk of certain diseases and scour the ever-growing databases of genomic, medical, and scientific research. In a step forward, the company also hopes to use things like natural language processing and deep learning to help customers query their genetic data on demand.
It's not totally surprising that Veritas bought up Curoverse. Both companies spun out of George Church's prolific Harvard lab. Several years ago, Church started something called the Personal Genomics Project, with the goal of sequencing 100,000 human genomes—and linking each one to participants' health information. Veritas' founders helped lead the sequencing part—starting as a prenatal testing service and launching a $1,000 full genome product in 2015—while Curoverse worked on academic strategies to store and sort through all the data.
But more broadly, genomics and AI practically call out for one another. As a raw data format, a single person's genome takes up about 150 gigabytes. How!?! OK so, yes, storing a single base pair only takes up around two bits. Multiply that by roughly 3 billion—the total number of base pairs in your 23 chromosome pairs—and you wind up with around 750 megabytes. But genetic sequencing isn't perfect. Mirza Cifric, Veritas Genetics’ cofounder and CEO, says his company reads each part of the genome at least 30 times in order to make sure their results are statistically significant. "And you gotta keep all that data, so you can refer back to it over time," says Cifric.
That's just storage. "Everything after that is going to specific areas and asking questions: There’s a variant at this location, a substitution of this base, a deletion here, or multiple copies of this same gene here, here, and here," says Cifric. Now, interpret all that. Oh, and do it across a thousand, hundred thousand, or million genomes. Querying all those genetic variations is how scientists get leads to find new drugs, or figure out how existing drugs work differently on different people.
But cross-referencing all those genomes is just the beginning. Curoverse, which was focusing on projects to store and sort genomic data, also has its work cut out for it in searching through the 6 million—and counting—jargon-filled academic papers detailing gene behavior, including visual information found in charts, graphs, and illustrations.
That's pretty ambitious. Natural language processing is one of the stickiest problems in AI. "Look, I am a computer scientist, I love AI and machine learning, and no amount of coding makes sense to solve this," says Atul Butte, the director of UCSF's Institute of Computational Health Sciences. At his former job at Stanford University, Butte actually tried to do the same thing—use AI to dig through genetics research. He says in the end, it was way cheaper to hire people to read the papers and input the findings into his database manually.
- Bahar Gholipour
Artificial Intelligence Could Dig Up Cures Buried Online
- Megan Molteni
Artificial Intelligence Is Learning to Predict and Prevent Suicide
- Anna Vlasits
AI Could Target Autism Before It Even Emerges—But It's No Cure-All
Related Stories
But hey, never say never, right? However they accomplish it, Veritas wants to move past what companies like 23andMe and Color offer: genetic risk based on single-variant diseases. Some of America's biggest dangers come from diseases like diabetes and heart disease, which are activated by interactions between multiple genes—in addition to environmental factors like diet and exercise. With AI, Cifric believes Veritas will be able to not only dig up these various genetic contributors, but also assign each a statistical score showing how much it contributes to the overall risk.
Again, Butte hates to be a spoilsport, but … there's all sorts of problems with doing predictive diagnostics with genetic data. He points to a 2013 study that used polygenic testing to predict heart disease using the Framingham Heart Study data—about as good as you can get, when it comes to health data and heart disease. "They authors showed that yes, given polygenic risk score, and blood levels, and lipid levels, and family history, you can predict within 10 years if someone will develop heart disease," says Butte. "But doctors could do the same thing without using the genome!"
He says the problems come down to just how messy it is trying to square up all the different research on each gene alongside the environmental risks, and all the other compounding factors that come up when you try to peer into the future. "It’s been the holy grail for a long time, structured genome reporting," says Butte. Even attempts to get researchers to write and report data in a standard, machine-readable way, have fallen flat. "You get into questions that never go away. One researcher defines autism different from another one, or high blood pressure, or any number of things," he says.
Butte isn't a total naysayer. He says partnerships like the one between Veritas and Curoverse are becoming more common—like the data processing deal between genetic sequencing giant Illumina and IBM Watson—because there's a clear need for new computing methods in this area. "You want to get to a point where you are developing stuff that improves clinical care," he says.
Or how about directly to the owners of the genomes? Cifric hopes the merger will improve the consumer experience of using genetic data, even seamlessly integrating it into daily life. For instance, linking your genome and health records to your digital assistant. Alexa, should I eat this last piece of pizza? Maybe you should skip it, depending on your baseline genetic risk for cholesterol and latest blood test results. Diet isn't the only area where genomics could help improve your day to day life. Some people are more or less sensitive to over the counter drugs. A quick query might tell you whether you should take a little less Tylenol than is recommended.
Cifric thinks this acquisition could position Veritas as a global powerhouse of genomic data. "Apple recently announced that they had shipped 41 million iPhones in a quarter, right? I think in not too distant future, we’ll be doing 41 million genomes in a quarter," he says. That might seem ambitious, given that the cost to consumers is nearly $1,000. But that cost is bound to come down. And artificial intelligence will make paying for the genome a matter of common sense.
This story has been updated to reflect that the company is named Veritas Genetics, not Veritas Genomics.
Related Video
Technology
The Robot Will See You Now – AI and Health Care
Artificial intelligence is now detecting cancer and robots are doing nursing tasks. But are there risks to handing over elements of our health to machines, no matter how sophisticated?