Oracle Corporation

05/02/2024 | Press release | Distributed by Public on 05/02/2024 08:53

Valence Labs uses OCI to help build largest GNN in drug discovery


Valence Labs is a research engine, powered by Recursion, committed to advancing the frontier of AI in drug discovery. Valence has been using Oracle Cloud Infrastructure (OCI) as an AI platform<_w3a_sdt id="1244063451" sdttag="goog_rdk_0"> to help develop powerful new foundation models for drug discovery<_w3a_sdt id="-1046759062" sdttag="goog_rdk_1">. This blog post explores just one of them, which presents the first work in scaling molecular graph neural networks (GNNs) to the billion parameters regime, with consistent performance gains on downstream tasks with increasing model scale.

In the dynamic realm of <_w3a_sdt id="2138141729" sdttag="goog_rdk_2">digital chemistry, advancements are accelerating at an unprecedented pace, revolutionizing drug discovery, material science, and beyond. One groundbreaking innovation in this domain is the introduction of <_w3a_sdt id="255786073" sdttag="goog_rdk_4"><_w3a_sdt id="1657880087" sdttag="goog_rdk_5">MolGPS, a foundational GNN tailored specifically for molecular property prediction. Using the power of GNNs and the computational capabilities of a substantial GPU cluster on OCI, MolGPS <_w3a_sdt id="-528185563" sdttag="goog_rdk_6">can use a chemical compound's structure to predict how that compound will be absorbed, distributed, metabolized, and excreted by the human body. <_w3a_sdt id="-411616708" sdttag="goog_rdk_7" showingplchdr="t">

Unveiling MolGPS<_w3a_sdtpr>

MolGPS represents a significant leap in molecular property prediction methodologies.<_w3a_sdt id="1887678768" sdttag="goog_rdk_8"> Typically, molecular property prediction models are designed to perform a single task, such as predicting liver toxicity. In contrast, MolGPS has learned to recognize the overall patterns in how molecules behave and interact. As a result, MolGPS has achieved state-of-the-art performance on 1<_w3a_sdt id="-316335357" sdttag="goog_rdk_9">2<_w3a_sdt id="-1041512162" sdttag="goog_rdk_10"><_w3a_sdt id="-2003651251" sdttag="goog_rdk_11"> out of 22 ADMET tasks from the Therapeutics Data Commons.

<_w3a_sdtpr>Built on the principles of GNNs, MolGPS excels in capturing intricate relationships and structural features within molecular graphs, enabling more accurate and efficient property predictions. Its architecture comprises of a transformer model like ChatGPT but also a message-passing to handle the complexity of molecular data. It empowers researchers and scientists with a potent tool for accelerating drug discovery pipelines, material design processes, and other critical endeavors. The following figure shows how larger MolGPS models perform better when they're finetuned on small downstream datasets.<_w3a_sdt id="990913107" sdttag="goog_rdk_17"> It also demonstrates that MolGPS reaches state-of-the-art performance levels of the Therapeutics Data Commons benchmark tests, as noted by the orange line.

The role of AI infrastructure

At the heart of MolGPS lies what wouldn't have been possible without OCI's fast, responsive, and reliable compute offering. Roughly 300,000 jobs were submitted to the GPU cluster, which included the scaling to six different axes and finetuning against 34 datasets to validate Valence's<_w3a_sdt id="-1993401264" sdttag="goog_rdk_23"> scaling hypotheses.

Compute clusters with parallel processing capabilities are tailor-made for accelerating deep learning tasks like those encountered in molecular property prediction. By harnessing the immense computational power MolGPS achieves remarkable speedups in model training, expediting the exploration of vast chemical spaces and empowering researchers to tackle complex challenges with agility and precision.

Going beyond hardware with OCI

OCI emerged as a strategic partner in this journey, offering a robust and scalable platform optimized with bare metal compute and industry-leading internode bandwidth for GPU-accelerated workloads. However, Valence's experience has transcended the AI platform itself, including Principal Software Developer Julien St-Laurent recently commenting in the OCI and Valence support channel, "It's been very stable, and the support here has been top-notch…we're very happy." The intersection of people and technology from OCI were fundamental to the success of not just building one model, but also understanding how to build the ideal model.

Conclusion

In the pursuit of unraveling the mysteries of molecular properties, MolGPS emerges as a beacon of innovation, guided by the principles of GNNs and propelled by the computational might of GPUs. With Oracle Cloud Infrastructure as a steadfast partner, Valence Labs journey towards transformative discovery became not just a possibility, but a tangible reality. As researchers continue to push the boundaries of scientific exploration, the fusion of <_w3a_sdt id="1782529181" sdttag="goog_rdk_24">scaled data generation, cutting-edge technologies<_w3a_sdt id="-1386643422" sdttag="goog_rdk_25">, and scalable infrastructure paves the way for a future where molecular insights drive profound advancements across diverse domains.

To learn more, see the following resources: