top of page

Research: Clustering Infinite Molecules

Institution: University of Florida

Department: Chemistry

Principle Investigator: Ramon Alain ​Miranda Quintana

Period of Affiliation: January 2024 - Present

Research Focus

In today’s data-rich world, many analysis algorithms are struggling to keep up, particularly in drug design where large molecular libraries must be processed. Although hardware and software advancements have improved molecule processing, simulation post-processing analysis often lags, especially with methods like the Taylor-Butina algorithm, which scales poorly with larger datasets. Our goal is to develop a novel clustering algorithm that scales linearly with dataset size, overcoming the time and memory bottlenecks of competing approaches. Early results indicate that our approach can efficiently process datasets, such as the ChEMBL library, in under 30 minutes. This progress allows for effective clustering of large and dynamic datasets, crucial for exploring chemical space and optimizing molecular properties.

image.png

Project Responsiblities

- Translate the optimized Python code into C++ for further efficiency of clustering operations

- Familiarize with NumPy, and apply a similar library to C++

- Technologies used: C++, Python, NumPy, VSCode

image.png
bottom of page