AI Start-up Successfully Applies LLMs in Bioengineering

2024-03-13

When NVIDIA CEO Jensen Huang talked about how he uses ChatGPT to understand how generative AI can solve real-world problems like plastic degradation and reducing carbon emissions, few people knew that a European AI startup would use large language models to solve this problem for DNA and protein sequences, and practical use cases have emerged.


"About 60% of what we consume today, whether it's drugs, food, or chemicals, can be manufactured through biological means. This feels more impactful than some of the other applications people are studying," said Stef van Grieken, co-founder and CEO of Cradle.


Using LLM for Bioengineering


Cradle is a European biotech startup that uses AI to help scientists design and manufacture proteins faster and more economically. This AI startup focuses on using generative AI to modify engineered protein patterns such as enzymes, vaccines, peptides, and antibodies.


Similar to ChatGPT, you can give it an equation and get an answer, or give it a prompt and get an image. At Cradle, you input a description of DNA or the appearance of a molecule and add the operations you need to perform on it. For example, binding to specific substances on cells, maintaining stability, or solubility in water.


"Its role is to generate another set of sequences that you can take to the lab, which have a higher likelihood of performing the desired operations," Grieken said. "This is different from diffusing images; you're diffusing a molecule."


Similar to GPT's training through infilling, where words are removed from sentences and the model is asked to fill them in, Cradle works in a similar way, but for DNA and protein sequences.


The progress in surpassing previous benchmarks and enhancing scale with these models is about twice as much as previous methods. "This means that the speed at which you achieve goals in the entire R&D process is twice as fast as before," Grieken said.


"Google, Facebook, and other companies have done a lot of work more in terms of machine learning research and development. They haven't tried to build tools that help biologists use these methods in a simple way," he said.


Cradle works on proprietary models inspired by open-source models like Transformer-based BERT. "In terms of biotech capabilities like molecular biology, we're still like GPT 0.5," he said.


Data and Feedback Loops Remain Challenging


The scarcity of protein data hinders the development speed of these models, especially compared to training GPT models with all available information on the internet. "Training these models on public data is really difficult. That's why we have our own in-house lab to build effective training sets for these machine learning models to learn faster," Grieken said.


The feedback loop for these models is also slow, hindering progress. Grieken compared this process to the GPT model, where feedback can be obtained immediately if the generated results are wrong, poor, or correct, which helps train the model immediately. "In our case, it takes three months from generating things to getting results back," he said. Additionally, the cost of generating results is high, with costs ranging from $30 to $1000 per data point.


Making the World a Better Place


Cradle addresses many real-world problems related to medical research, particularly in terms of time, cost, and logistics accessibility. Due to issues with refrigeration and distribution networks, many vaccines are difficult to distribute worldwide.


"If you can develop certain drugs that can be stored at room temperature, you can take them to more places in the world, which is helpful, so you end up with a better product," Grieken said.


Grieken also believes that if the time and funding required for solutions to cure diseases or shift towards more bio-based products from petrochemical products are reduced, such products will flood the market.


With rich experience working at large tech companies like Google, Grieken suggests that everyone should work at a large tech company for a while and then create something else after accumulating a certain amount of learning experience.


"I'm very grateful to Google. First, they teach you how to do engineering. Second, I was lucky to work at Google when language models were starting to emerge," Grieken said, considering himself fortunate to be involved in the early stages.


Cradle has raised a total of $29.7 million in funding and has offices in Amsterdam, Netherlands, and Zurich, Switzerland.