How is AI optimizing CRISPR gene editing? Find out here

Using AI to Optimize CRISPR Gene Editing

25 Jul 2024

By Dr. Mustapha Aouida and Dr. Nady El Hajj

Dr. Mustapha Aouida and Dr. Nady El Hajj.
Dr. Mustapha Aouida and Dr. Nady El Hajj.

We are currently witnessing the rapid rise of Generative A.I. and its impact on revolutionizing several fields, from arts to healthcare. Recent advancements have allowed A.I. to write novels, develop code, and create images and videos using simple text prompts. This technology is now making an impact in gene editing, promising an improvement in the efficiency and accuracy of CRISPR-Cas9 (Clustered Regularly Interspaced Short Palindromic Repeats associated protein 9) gene editing technology.

Since the discovery of the technology by Jennifer Doudna and Emmanuelle Charpentier in 2012, CRISPR has revolutionized biomedical research and offered new avenues to treat genetic disorders. Last year, Casgevy was the first gene therapy based on CRISPR-Cas9 to be approved for the treatment of sickle cell disease. Based on their groundbreaking work on CRISPR, both Charpentier and Doudna were awarded the Nobel Prize in Chemistry in 2020.

CRISPR-Cas9 has been adapted from the antiviral mechanism of bacteria and repurposed for use as a gene editing toolbox. It has been identified as one of the simplest, most versatile and precise tools in genetic engineering to enable both fundamental research and wide-ranging applications in plants, animals, and humans. 

CRISPR-Cas9 is composed of an endonuclease Cas protein that drives DNA cleavage activity and a single guide RNA molecule (sgRNA) that determines the system’s specificity through binding to the targeted gene. The most widely adopted Cas9 nuclease originated from the Streptococcus pyogenes bacteria and is referred to as spCas9. The sgRNA required for the activation of the CRISPR-Cas9 system is composed of a 17-20 nucleotide CRISPR RNA (crRNA) targeting sequence that is a complementary target site and the 85-nucleotide transactivating CRISPR RNA (tracrRNA) processing sequence, which provides structural stability of the ribonucleoprotein complex (RNPC). Upon protospacer adjacent motif (PAM) recognition, sgRNA binding, and the formation of the RNPC, the two catalytic nuclease domains induce double-strand breaks of 3 to 4 nucleotides upstream of the PAM site, resulting in the activation of DNA repair machinery. This would enable accurate editing by removing, adding, or replacing bases where the DNA was cut. 

One of the main challenges of the technology is to specifically and accurately target the DNA sequence to be corrected. Off-target effects might occur when the CRISPR-Cas9 system makes unintended cuts in the genome. This can potentially lead to undesired mutations and might pose a challenge for gene editing, affecting the safety and accuracy of CRISPR-based therapies. 

Many methods have been developed to improve the efficiency and the specificity of the CRISPR-Cas9 genome editing tool by utilizing natural or engineered orthologous Cas9s, engineering the Cas9 protein or guide RNA, by modulating the kinetics and regulation of the CRISPR components in the cell or by changing the delivery method of CRISPR/Cas9 components: (i) DNA plasmid encoding both the Cas9 protein and the guide RNA, (ii) mRNA for Cas9 translation alongside a separate guide RNA, and (iii) Cas9 protein with guide RNA (ribonucleoprotein complex).

A New Frontier in Genetic Editing

In a recent preprint, Profluent, a California-based startup unveiled it’s innovative approach (https://www.biorxiv.org/content/10.1101/2024.04.22.590591v1.full.pdf) based on generative A.I. to generate CRISPR-Cas proteins across a wide array of families. This innovation has the potential to improve CRISPR gene therapy by enhancing specificity and minimizing off-target effects. 

At the core of this technology are large language models (LLMs) trained on large biological datasets including CRISPR operons, CRISPR-associated proteins, tracrRNAs, and PAMs. By employing LLMs trained on this data, novel gene editors were generated with superior activity and specificity compared to traditional CRISPR-Cas9 systems. Those novel editors were 400 mutations away in sequence from spCas9, which indicates significant innovation in their design. 

The AI generated Cas9-like proteins were experimentally tested and validated in human cells, ensuring accuracy and efficiency in genome editing. Those results indicate that the AI generated Cas9-like proteins could be a viable alternative to spCas9 for use in gene editing technologies. 

One of these AI-generated proteins known as OpenCrispr-1 was made open source by Proquest for use in a wide variety of applications including research and commercial applications. This has the potential to democratize access to cutting edge gene editing technologies, thus accelerating scientific discovery. 

Despite open-sourcing OpenCRISPR-1, Proquest did not open-source the AI models used to generate the editor.

This AI-driven approach to design novel gene editors opens new avenues for increasing the accuracy and precision of current gene editing technologies and would have important implications for advancing gene based therapies for genetic disorders. This work underscores the potential of AI to enhance the functionality and precision of genome editors, which paves the way for future innovations in biotechnology and medicine. The interesting aspect of this technology is that it can develop novel functional systems that have never existed before. 

The Promise of this AI-based Gene Editing Technology

Despite the novelty of this approach, we believe it will not have any direct impact on clinical care and treatment of genetic disorders for the time being. The gene editors need to undergo lengthy clinical trials and regulatory approvals before they can be used on patients. The developed editors have been tested in vitro, however pre-clinical and clinical studies need to be conducted to understand their accuracy, efficiency, and safety to determine whether they can perform better than current CRISPR-Cas systems in vivo

Nevertheless, the approach introduced by Profluent highlights the power of AI to learn from existing biological data to figure out what worked in nature over a billion years of evolution. AI can also keep learning from new data which promises improvement in the technology over time. Overall, we believe that this technology is an important step forward and could hopefully accelerate the use of CRISPR-Cas systems to correct genetic mutations for treating several human genetic disorders.

Dr. Mustapha Aouida works as Research Scientist and Lab Manager at Hamad Bin Khalifa University’s (HBKU) College of Health and Life Sciences, Doha, Qatar.

Dr. Nady El Hajj is an Assistant Professor at HBKU’s College of Health and Life Sciences, Doha, Qatar.

This piece has been submitted by HBKU’s Communications Directorate on behalf of its author. The thoughts and views expressed are the author’s own and do not necessarily reflect an official University stance.