Powered by logic and large language models, DeepGO-SE predicts the biological role of proteins and could be used to aid drug discovery.
A new artificial intelligence (AI) tool that draws logical inferences about the function of unknown proteins promises to help scientists unravel the inner workings of the cell.
Developed by KAUST bioinformatics researcher Maxat Kulmanov and colleagues, the tool outperforms existing analytical methods for forecasting protein functions and is even able to analyze proteins with no clear matches in existing datasets.[1]
The model, termed DeepGO-SE, takes advantage of large language models similar to those used by generative AI tools such as Chat-GPT. It then employs logical entailment to draw meaningful conclusions about molecular functions based on general biological principles about the way proteins work.
It essentially empowers computers to logically process outcomes by constructing models of part of the world — in this case, protein function — and inferring the most plausible scenario based on common sense and reasoning about what should happen in these world models.
“This method has many applications,” says Robert Hoehndorf, head of the KAUST Bio-Ontology Research Group, who supervised this research, “especially when it is necessary to reason over data and hypotheses generated by a neural network or another machine learning model,” he adds.
Kulmanov and Hoehndorf collaborated with KAUST’s Stefan Arold, as well as researchers at the Swiss Institute of Bioinformatics, to assess the model’s ability to decipher the functions of proteins whose role in the body are unknown.
The tool successfully used data regarding the amino acid sequence of a poorly understood protein and its known interactions with other proteins and precisely predicted its molecular functions. The model was so accurate that DeepGO-SE was ranked in the top twenty of more than 1,600 algorithms in an international competition of function prediction tools.
The KAUST team is now using the tool to investigate the functions of enigmatic proteins discovered in plants that thrive in the extreme environment of the Saudi Arabian desert. They hope that the findings will be useful for identifying novel proteins for biotechnological applications and would like other researchers to embrace the tool.
As Kulmanov explains: “DeepGO-SE’s ability to analyse uncharacterized proteins can facilitate tasks such as drug discovery, metabolic pathway analysis, disease associations, protein engineering, screening for specific proteins of interest and more.”