Research: Clustering Infinite Molecules
Institution: University of Florida
Department: Chemistry
Principle Investigator: Ramon Alain ​Miranda Quintana
Period of Affiliation: January 2024 - Present
Research Focus
In today’s data-rich world, many analysis algorithms are struggling to keep up, particularly in drug design where large molecular libraries must be processed. Although hardware and software advancements have improved molecule processing, simulation post-processing analysis often lags, especially with methods like the Taylor-Butina algorithm, which scales poorly with larger datasets. Our goal is to develop a novel clustering algorithm that scales linearly with dataset size, overcoming the time and memory bottlenecks of competing approaches. Early results indicate that our approach can efficiently process datasets, such as the ChEMBL library, in under 30 minutes. This progress allows for effective clustering of large and dynamic datasets, crucial for exploring chemical space and optimizing molecular properties.
Project Responsiblities
- Translate the optimized Python code into C++ for further efficiency of clustering operations
- Familiarize with NumPy, and apply a similar library to C++
- Technologies used: C++, Python, NumPy, VSCode