Balderton joins $30M Series D for big data biotech platform play, Sophia Genetics

Natasha Lomas

Switzerland based SaaS startup Sophia Genetics is hoping to give IBM Watson a run for its money in the healthcare diagnostics space. It's built a big data analytics platform that harnesses clinicians' medical expertise to enhance genomic diagnostic via AI algorithms -- leading, it says, to better and faster diagnoses for patients with diseases such as cancer.

Hospitals that use the platform are intended to jointly benefit from expert-fed, algorithmic DNA sequencing diagnostic insights exactly because they are shared across the platform. So as the user-base scales -- it says it's adding 10 new hospitals each month -- Sophia Genetics' AIs get smarter and more accurate, and patients anywhere can benefit from the pooled knowledge.

The company is announcing a $30 million Series D funding round today, adding UK-based VC firm Balderton Capital to its investor roster, along with 360 Capital Partners. Previous investors including UK tech entrepreneur Mike Lynch’s Invoke Capital, and Alychlo, started by Mark Coucke, a Belgian pharmaceutical entrepreneur.

According to Crunchbase the biotech business has raised $28.75M since being founded back in 2011, so has pulled in in the region of $58.75M thus far -- capital that's been used to develop its platform proposition to a tipping point of utility, as co-founder and CEO Dr Jurgi Camblong explains.

As the cost of genome sequencing has come down he says the challenge for healthcare providers has been quickly and accurately reading and analyzing more readily available DNA sequencing data. This is where Sophia Genetics' analytics platform aims to assist -- currently targeting oncology, hereditary cancer, metabolic disorders, pediatrics and cardiology.

"With the decreasing costs of these technologies that [are] basically digitalizing patients' DNA information, we did see an opportunity to engage with hospitals to help them be part of a community and share experience and knowledge to continuously better diagnose and treat patients through the use of such type of digital technologies," he tells TechCrunch.

"Since our dream was to impact on better diagnosing of the maximum number of patients we thought that in the end the best way was helping every hospital to leverage on this genomic technology. Rather than build a company that would end up competing with the hospitals. And so that's why we built a software as a service platform."

However, for the platform play to work Camblong says the company needed to be able to attract hospitals to sign up even before it had algorithms that could offer accelerated diagnostic insights -- so it needed to be able to offer them something of value right away to get them involved.

And while Camblong said the team's initial thought was that processing and storage would likely be the major challenges for hospitals handling what are extremely large genomic data-sets, along with issues such as data integrity, privacy and visualization, they actually found the main problem hospitals were grappling with was data accuracy. So they set out to help with that to offer early utility and win longer term buy in from clinicians.

"All of them [were] purchasing those technologies to basically better diagnose patients but the data they would produce, although they would be larger, would not be as accurate as what they would have with legacy technology -- and this is where we were somehow forced as a startup...  to develop algorithms that would correct the data so that clinicians would be able to rely on this data. And use this data to better diagnose patients," he says.

"This is really how we started, from 2011 where we had nothing, to launching our platform in 2014 where we were 20 employees and we were working with I think 50 hospitals by the end of 2014. To today where we are working with over 350 hospitals that are all connected through our SaaS platform, who are all pulling patients' genome data, sharing knowledge to continuously get a better outcome of our algorithms that by the time [i.e. now] have become an artificial intelligence."

On the data accuracy issue, Camblong says the startup worked with hospitals to benchmark DNA samples analyzed via their sequencing systems, with the aim of "getting the signal out of the noise", as he puts it, and then training algorithms of its own to be able to perform that de-noising process automatically, and to recognize the salient/relevant patterns in the genome data. And thus, ultimately, to speed up diagnoses in the targeted health areas.

Sophia Genetics refers to its business as sitting within the "fast-emerging field of data-driven medicine" -- and is specifically applying AI to enhance relatively modern, so-called "Next Generation DNA Sequencing" (NGS) methods, which may be faster than but aren't as accurate as older-gen legacy systems, according to Camblong.

"All the AI technology that we've developed is based on statistical inference, pattern recognition, and some of it as well on machine learning," he says of Sophia Genetics' core tech.

Data are not valuable any more once you have them. In any AI industry what is interesting is seeing the capacity to be exposed to the problem and teach an algorithm on how to recognize and solve the problem.

"Data are not valuable any more once you have them," he adds, fleshing out the startup's relationship with its hospital customers/partners. "In any AI industry what is interesting is seeing the capacity to be exposed to the problem and teach an algorithm on how to recognize and solve the problem. But once you have taught this AI [to do] that you don't need any more the data you've been computing. So it's not so much the fact that we get access to this data -- it's because, unlike any other actor in the industry, we took this challenge of taking the pain.

"Unlike no other company we understood that the problem was accuracy and we took the challenge of aggregating the problem of accuracy."

Commenting on why Sophia Genetics stood out for Balderton, partner James Wise told us: "On top of their easy to use workflow tool to annotate and use sequenced data (compared with unsupported open source software) and their active clinician community, Sophia's real technological advantage comes out of its machine learning technology that analyses the genomic data and minimizes the noise from the use of multiple different combinations of sequencers and diagnostic kits to identify variants (DNA alterations) with a clinical-grade accuracy."

"As the market for diagnostic kits continues to expand, and as new sequencers come to market, there will continue to be a plethora of different ways that clinicians can use genomic data to make a diagnosis. But this requires a sophisticated third party platform to handle these many different inputs and to optimize their outcomes -- in Sophia Genetics’ case by using machine learning techniques across the huge datasets and through testing with their clinician network," he added.

"While there are competing solutions for tertiary analysis that may work well with a certain type of sequencer, it is Sophia's independent position and its technical ability to incorporate any combination of diagnostic and sequencer that makes its technology universal and unique."

Camblong says Sophia Genetics has benchmarked DNA sequencing data for more than 10,000 patients, and for over 500,000 unique variants at this stage -- and currently has three "core" diagnostic technologies trained off of this data.

It says the process it uses has been validated with more than 340 different DNA sequencers, while its algorithms were built bottom-up from raw FASTQ data (aka the most common file format used in DNA sequencing) -- and claims its tech is universally applicable.

"You cannot use deep learning techniques in this industry," says Camblong, elaborating on why the business took several years to train algorithms manually, with human experts benchmarking and analyzing data. "You need to have the prior knowledge. Deep learning requires you to have millions of millions of millions of data. And then you can expect that because of that eventually the neurons you will build are going to be able to find the way by their own. In many industries you need to have prior knowledge.

"First for the accuracy phase, Sophia has been learning by our data scientists because they have been exposed to the patterns [i.e. by analyzing the DNA sequencing data]... and then at the second stage, once you have a platform... the platform can evolve and learn with machine learning techniques."

At this stage he says the business is in its second phase -- utilizing the network of hospitals and clinicians it has signed up and linked via its platform, and drawing on the access to thousands of cases it's been afforded, coupled with the continued elbow grease of clinicians feeding their diagnostic knowledge on the pathogenicity of variants into the platform on an ongoing basis -- to be in a position to now apply machine learning techniques to accelerate utility and scale the business. Hence taking in more funding.

Camblong refers to what the platform does as a "democratization" of DNA sequencing expertise, asserting: "So that the next hospital that starts using your technology will enter at a level where it will require less competencies, less experience to be able to diagnose patients through the use of genomic information."

It charges hospitals for use of the platform on an on-demand basis -- so they pay per analysis performed, rather than having to shell out for a fixed monthly fee.

The workflow for using the platform involves a patient with one of the suspected conditions arriving at the hospital and having a sample taken. Their DNA is extracted and enriched with molecular biology principles, and genes selected to be redone by the hospital's NGS machine.

The digitization of that data takes two days, after which users log in to Sophia Genetics' platform and load in the raw data, which is transferred to the company's datacenters ("in an anonymized way", according to Camblong; he also confirms that the platform prompts hospitals to confirm it has patients' consent for transferring their data to be processed by a third party) -- and then the startup's AI algorithms get to work to pull out unique genetic variants.

"These data are going to be annotated... it means that you add additional information that is out there in public databases, or as well in the databases of the users of Sophia DDM, and then the data are being ranked according to pathogenicity predictions," he continues, noting that the data processing undertaken by its AI takes two hours.

"Two hours later the user logs in and given the genetic variants that are being detected the user is going to take action -- so Sophia can learn as well from these actions. The expert is going to classify those variants as being pathogenic or benign."

Camblong says the platform has moved from having a precision rate of 85% for classification of variants for the first 10,000 patients, to 95% with the following 10,000, and 98% with the 10,000 after that.

"We are always between 99.9% and 100% for sensitivity, and between 99% and 100% specificity," he adds of the platform's current average accuracy range.

As it evolves, he says the wider vision is to add more layers to expand its capabilities -- so it could, for example, compute imaging data from medical scans together with molecular genomics data to support more powerful predictive analyses.

"If you combine two sequence images and molecular information about [a cancer] tumor you can predict how the tumor is going to evolve in the following months," he suggests, saying surgeons could then make decisions about whether they need to operate immediately or whether they could wait. So the big push is towards the opportunity of an ever more personalized form of healthcare -- enabled by AI being able to shrink the time-scales and costs of performing robust genomic analysis.

He says the new funding will be used to "fully deploy" Sophia's SaaS platform globally, and to ramp up commercial activity -- moving beyond its current focus on Europe to Latin America, AsiaPac, Canada and the U.S.

"We believe that the number of hospitals that will adopt our technology will dramatically ramp up over the next year," he says.

The investment will also go into oncology, specifically -- towards developing what he calls "full management of a cancer case", explaining this as encompassing: "From the first image that has been taken with a scan, up to the monitoring of the efficiency of the treatment and eventually adaptation of the treatment."

It also intends to add additional capacity generally, so it can associate molecular information with metadata, such as imaging data -- to start to push towards expanding the platform's analytical capabilities by supporting the co-processing of multiple types of healthcare data pertaining to its targeted conditions.

Though Camblong concedes that the privacy challenges will step up as more highly sensitive medical data gets processed in concert.

"We took [privacy] very serious. There are companies in the industry that have made very bad moves in the past. And we have never wanted to go to a DTC [direct to consumer] approach. For us it was very clear that if you wanted to impact on better diagnosing the maximum number of patients, trust by the institutions would be very important," he says.

"You cannot roll out an AI unless you build it bottom up. So everything you've been challenging me about on how we've been able to build this AI to make it accurate is really what distinguishes Sophia from any other actor that may want to be important in this space. We have been the only one who made the effort of digging into this complexity of making those data accurate -- and of making everything bottom up, because that's the only way you can build smart intelligence, or artificial intelligence," he adds.

"To take a parallel, self-driving cars are not going to learn from speech recognition systems -- they will learn from you, from me, from people that are going to drive cars, make mistakes, take right decisions and by knowing whether we have taken the right decision or whether we've made mistakes we are going to be able to teach the cars how to drive themselves."