Highlight Articles

The Molecules of Life

Basic biology textbooks will tell you: all life on Earth is built from four types of molecules. The four molecules of life are proteins, carbohydrates, lipids, and nucleic acids, with each of the four groups vital for every single living organism.

The Molecules of Life

Can We Create the Molecules of Life?

Experiment harnessed the full power of the HiPerGator supercomputer over the 2023 Winter Break.

Basic biology textbooks will tell you: all life on Earth is built from four types of molecules. The four molecules of life are proteins, carbohydrates, lipids, and nucleic acids, with each of the four groups vital for every single living organism.

Until a decade ago, conducting research on the evolution and interactions of large collections of atoms and molecules could only be modeled using very simple computer simulation experiments, because the computing power needed to handle the large data-sets to understand life at its most basic form just wasn’t available.

During UF’s 2023 Winter Break, Jinze Xue, a Ph.D. student in the Roitberg Computational Chemistry Group, conducted a large-scale early Earth chemistry experiment utilizing more than 1000 A100 GPUs on HiPerGator. A molecular dynamics experiment was performed on 22 million atoms that identified 12 amino acids, three nucleobases, one fatty acid, and two dipeptides. The discovery of larger molecules, which would not have been possible in smaller computing systems, is a significant achievement.

Dr. Adrian Roitberg, V.T. and Louise Jackson Professor in UF’s Department of Chemistry, and his research group have been exploring the use of Machine Learning (ML) to study chemical reactions for the past six years.

“Our previous success enabled us to use ML/AI to calculate energies and forces on molecular systems, with results that are identical to those of high-level quantum chemistry--but around one million times faster!

The accomplishments empowered us to ask the following question: Can one create ‘molecules of life’, such as amino acids and DNA bases starting from a simple mix of gases and simulating their reactions?

Roitberg continues:

“These questions have been asked before, but due to computational limitations, previous calculations used small numbers of atoms and could not explore the range of time needed to obtain results.  But with HiPerGator, we can do it!”

Erik Deumens, senior director in UFIT for Research Computing, explains how this full takeover of HiPerGator was possible:

"HiPerGator is heavily used throughout the year for research and education. But it has the unique capability to run very large `hero’ calculations that use the entire machine with the potential to lead to breakthroughs in science and scholarship. Such calculations require careful scheduling to not impact the normal business of supporting research and teaching, which is why we do them around quieter times, like Christmas and over the July 4th holidays.  When we found out about the work Dr. Roitberg’s group was doing, we approached him to try a `hero’ run with the code he developed.”

The emergence of AI and very powerful GPUs can enable such highly compute and data intensive scientific simulations to be carried out--calculations scientists could only imagine just a few years ago.  The combination of AI models that speed up the dynamics of very large collections of atoms and molecules coupled with the power of the University of Florida’s supercomputer (the fastest in U.S. higher education) enabled the experiment to be done.

Roitberg continues:

“Using Machine Learning methods, we created a simulation using the complete HiPerGator set of GPUs. We were able to see, in real time, the formations of almost every amino acid (e.g., alanine, glycine, etc.) and a number of very complex molecules.  This was very exciting to experience.”

This project is part of the effort to discover how complex molecules can form from basic building blocks, and to make the process automatic through large computer simulations. The molecules of life is one example of this general idea.

Roitberg notes that he and his research group spent many hours working with members of UFIT. Ms. Ying Zhang, UFIT’s AI Support Manager, ran point for this experiment.

“Ying put together a team comprised of Research Computing staff and staff from NVIDIA to help scale compute runs, provide invaluable advice and help, and accelerate analysis of the data to the point where the analyses were done in just seven hours, instead of the three days we initially expected it to take.  We met every week from initial conception to the final results, in a very fruitful collaboration.”

The results, and the short time in which HiPerGator was able to deliver them is inspiring new lines of inquiry from Roitberg and others who are getting closer every day to answering questions about how complex molecules, such as those seen in life, is formed.

“This is a great opportunity for UF faculty,” says Roitberg.  “Having HiPerGator in-house, with the incredible staff willing to go above and beyond to help researchers produce ground-breaking science like this, is something that makes my non-UF colleagues very jealous.”

HiPerGator, the University of Florida supercomputer, went online in 2013with 16,000 cores. It has been expanded and upgraded every few years since its initial go-live. HiPerGator’s last big expansion with the latest technologies was in 2020 and put into production in 2021. HiPerGator is housed in the UF Data Center, a 25,000+ sq. ft, Leadership in Energy and Environmental Design (LEED®) certified building.  Learn more about HiPerGator:  https://rc.ufl.edu/get-started/hipergator/.

Below is the list of molecules found in the 22 million-atom system:

Amino Acids

Name                                                             Count

Glycine

5,326,504

Alanine

456,201

Serine

2,206

Asparagine

992

Valine

637

Aspartic Acid                      

452

Threonine    

210

Leucine  

48

Glutamine

42

Glutamic Acid

37

Isoleucine

30

Lysine

10

                                                           

                       

                                    Nucleobases

Name                                                             Count     

Uracil

664

Cytosine

422

Thymine

379

                                    Fatty Acids

Name                                                             Count

Caprylic Acid                      

145

Dipeptides

Name                                                             Count

Glycylglycine

59

Alanylglycine

8