Faster and more reliable crystal structure prediction of organic molecules

30-Oct-2025
AI-generated image

Symbolic image

Prediction of crystal structures of organic molecules is a critical task in many industries, especially in pharmaceuticals and design of functional materials. In pharmaceuticals, crystal structures directly influence a drug’s solubility and stability. In functional materials, like organic semiconductors, controlling crystal structures is crucial for achieving desired electronic properties. However, crystal structure prediction (CSP) is an inherently challenging task due to the weak and diverse intra- and inter-molecular interactions unique to organic crystals. Even minor variations can result in entirely different packing arrangements.

CSP is typically conducted in two stages: structure exploration and structure relaxation. In the first stage, a large number of potential structures are generated, often at random, for which various search algorithms have been developed. During structure relaxation, these structures are refined to identify the most stable configurations using energy minimization. However, random structure generation often produces several low-density and unstable structures, while conventional density functional theory (DFT)-based methods for structure relaxation are computationally expensive and time-consuming.

To address these challenges, Associate Professor Takuya Taniguchi from the Center for Data Science and Ryo Fukasawa from Graduate School of Advanced Science and Engineering at Waseda University, Japan, developed a breakthrough machine learning (ML)-based CSP workflow called SPaDe-CSP that leverages space group (SP) and packing density (PD) predictors. “ Our workflow employs a unique strategy where machine learning models first predict the most probable space groups and crystal densities, filtering out unstable, low-density candidates before computationally intensive relaxation steps, ” explains Taniguchi. “ Together with an efficient neural network potential for structure relaxation, this method enables a more direct and reliable path to identifying experimentally observed crystal arrangements. ” Their study was published in the journal Digital Discovery on 13 October 2025.

SPaDe-CSP narrows the search space for organic crystals, by first predicting probable space group candidates and crystal densities using ML models. For training and testing, the researchers extracted a dataset from the Cambridge Structural Database (CSD), consisting of 32 space group candidates with 169,656 data entries. Both prediction models used MACCSKeys as the molecular fingerprint and LightGBM as the prediction function. The researchers also interpreted the trained models using Shapley additive explanations (SHAP) analysis to identify the most important structural characteristics for effective predictions.

After lattice sampling, the generated unrelaxed structures are then subjected to structure relaxation using an efficient neural network potential (NNP) pretrained on DFT data, ultimately producing the energy density diagram of the target molecule. Two hyperparameters control the SPaDe-CSP process: the probability threshold for filtering space groups and the tolerance window for the crystal density.

The researchers tested the workflow first on a model molecule from the CSD dataset to investigate the dependence of success rate on the hyperparameters, and then on 20 different organic molecules, including the model molecule, to test generalizability. The results were successfully validated against the known experimental crystal structures of the molecules, and also compared against the results obtained from conventional random-CSP.

Results revealed that the probability of success increases with higher space group threshold and smaller density tolerance window. For 80% of the tested compounds, SPaDe-CSP successfully predicted the experimental crystal structures, achieving twice the success rate of random-CSP. Notably, the researchers also identified a key structural descriptor correlating linearly with success rate, indicating both crystal- and molecule-level structural influences.

“Our strategy can significantly accelerate the design and discovery pipeline for new molecules within the pharmaceutical and materials science industries, ” says Taniguchi. “ This will enable faster, more reliable identification of most stable, effective physical form of a new drug, important for maintaining solubility, shelf life, and overall efficacy, and allow computational screening of novel functional materials with optimal electronic properties.”

By making CSP faster and more reliable, this research marks an important step towards accelerating discovery of life-saving medication and next-generation technologies.

Original publication

Other news from the department science

Most read news

More news from our other portals

Last viewed contents

Elevated hormone flags liver problems in mice with methylmalonic acidemia - Study findings can immediately be applied to human patients with the disease

Elevated hormone flags liver problems in mice with methylmalonic acidemia - Study findings can immediately be applied to human patients with the disease

Investigational monoclonal antibody to treat Ebola is safe in adults

Investigational monoclonal antibody to treat Ebola is safe in adults

NIH clinical trial of remdesivir to treat COVID-19 begins

NIH clinical trial of remdesivir to treat COVID-19 begins

APEIRON’s respiratory drug product to start pilot clinical trial to treat coronavirus disease

CORAT Therapeutics obtained regulatory authorization for clinical phase Ib/II trial with the SARS-CoV-2 neutralizing human antibody COR-101 - COR-101 is a human antibody that blocks virus infection by binding to the spike protein

CORAT Therapeutics obtained regulatory authorization for clinical phase Ib/II trial with the SARS-CoV-2 neutralizing human antibody COR-101 - COR-101 is a human antibody that blocks virus infection by binding to the spike protein

New muscle therapy gets fast-track boost - Berlin start-up could soon be helping children with previously incurable muscle diseases thanks to an accelerated approval process

New muscle therapy gets fast-track boost - Berlin start-up could soon be helping children with previously incurable muscle diseases thanks to an accelerated approval process

SAS accelerates delivery of novel medicines using AI and analytics - Cooperation with AstraZeneca for more innovation in clinical research

SAS accelerates delivery of novel medicines using AI and analytics - Cooperation with AstraZeneca for more innovation in clinical research

Gentle protein purification at the touch of a button - i3 Membrane revolutionizes the separation of biological agents

Gentle protein purification at the touch of a button - i3 Membrane revolutionizes the separation of biological agents

Bio-based plastics for infusion bags - Frankfurt research team develops sustainable alternative to medical products made from crude oil in cooperation with bioplastics start-up BIOVOX

Bio-based plastics for infusion bags - Frankfurt research team develops sustainable alternative to medical products made from crude oil in cooperation with bioplastics start-up BIOVOX

Recipharm secures major additional grant to develop AI-enabled manufacturing technologies - Development of an AI-enabled simulator and process control platform, to speed up process development, cut costs and improve manufacturing efficiency

Recipharm secures major additional grant to develop AI-enabled manufacturing technologies - Development of an AI-enabled simulator and process control platform, to speed up process development, cut costs and improve manufacturing efficiency

Merck Partners with Promega Corporation

Merck Partners with Promega Corporation

Berlin is set to become a hotspot for the development of innovative therapies - More than just a ground-breaking ceremony: start of construction for the Berlin Center for Gene and Cell Therapies

Berlin is set to become a hotspot for the development of innovative therapies - More than just a ground-breaking ceremony: start of construction for the Berlin Center for Gene and Cell Therapies