CSPBench: a benchmark and critical evaluation of Crystal Structure Prediction (2024)

\useunder

Lai Wei, Sadman Sadeed Omee
Department of Computer Science and Engineering
University of South Carolina
Columbia, SC 29201
\AndRongzhi Dong, Nihang Fu
Department of Computer Science and Engineering
University of South Carolina
Columbia, SC 29201
\AndYuqi Song
Department of Computer Science
University of Southern Maine
Portland, Maine, 04101
\AndEdirisuriya M. D. Siriwardane
Department of Physics
University of Colombo
Sri Lanka
\AndMeiling Xu
School of Physics and Electronic Engineering
Jiangsu Normal University
Xuzhou, China
\AndChris Wolverton
Department of Materials Science and Engineering
Northwestern University
Chicago, USA
\AndJianjun Hu*
Department of Computer Science and Engineering
University of South Carolina
Columbia, SC 29201
jianjunh@cse.sc.edu

Abstract

Crystal structure prediction (CSP) is now increasingly used in discovering novel materials with applications in diverse industries. However, despite decades of developments and significant progress in this area, there lacks a set of well-defined benchmark dataset, quantitative performance metrics, and studies that evaluate the status of the field. We aim to fill this gap by introducing a CSP benchmark suite with 180 test structures along with our recently implemented CSP performance metric set. We benchmark a collection of 13 state-of-the-art (SOTA) CSP algorithms including template-based CSP algorithms, conventional CSP algorithms based on DFT calculations and global search such as CALYPSO, CSP algorithms based on machine learning (ML) potentials and global search, and distance matrix based CSP algorithms. Our results demonstrate that the performance of the current CSP algorithms is far from being satisfactory. Most algorithms cannot even identify the structures with the correct space groups except for the template-based algorithms when applied to test structures with similar templates. We also find that the ML potential based CSP algorithms are now able to achieve competitive performances compared to the DFT-based algorithms. These CSP algorithms’ performance is strongly determined by the quality of the neural potentials as well as the global optimization algorithms. Our benchmark suite comes with a comprehensive open-source codebase and 180 well-selected benchmark crystal structures, making it convenient to evaluate the advantages and disadvantages of CSP algorithms from future studies. All the code and benchmark data are available at https://github.com/usccolumbia/cspbenchmark

Keywords crystal structure prediction $\cdot$ materials discovery $\cdot$ benchmark $\cdot$ neural network potential $\cdot$ deep learning

1 Introduction

The critical assessment of protein structure prediction (CASP) and advancements like Alphafold have significantly propelled research in predicting protein structures [1, 2]. Similarly, crystal structure prediction (CSP) methods have gained attention for organic molecules [3]. However, the focus on crystal structure prediction within the domain of inorganic materials is steadily growing and proving to be vital for discovering new materials across diverse industries. Understanding the crystal structure of a material holds immense importance as it significantly influences its physical, chemical, and mechanical properties. Traditionally, experimental techniques such as Density Functional Theory (DFT) calculations, coupled with global search algorithms or tailored experiments, have been utilized to determine crystal structures. While successful for many materials, these methods are often time-consuming, costly, and particularly challenging when dealing with novel or intricate compounds. Therefore, the applications and quantitative metrics within CSP are becoming increasingly indispensable for advancing research in inorganic materials and guiding their practical utilization.

Nowadays, a plethora of approaches for crystal structure prediction exists, including evolutionary algorithms [4, 5, 6], data mining [7, 8] and machine learning [9, 10]. Current methods of evaluating structures are mainly based on manual structural inspection, comparison with experimentally observed structures, comparison of energy or enthalpy values, success rate analysis, and computation of distances between structures. Nevertheless, the absence of a quantitative approach for evaluating predicted structures remains a challenge, hindering the ability to confidently ascertain their reliability and guide experimental validation. As the field progresses, developing robust quantitative evaluation methods is essential to unlock new frontiers in materials research and development. In this paper, we conducted a large scale benchmark study on the main CSP algorithms selected from Table 1, including CSP algorithms with ML potentials including ab initio CSP [11, 12], GN-OA (Graph Networks for crystal structure Optimization with Atomistic potentials) with ML-potentials [9], AGOX with M3GNet potential [13], Random structure search in Atomistic Global Optimization X (AGOX), Basin-hopping [14], Parallel tempering, Local GPR basin-hopping [15] , Evolutionary algorithm, the Bayesian Optimization GOFEE [16], template-based CSP [17, 18], DL-based CSP algorithms. Ab initio methods involve the calculation of the electronic structure and total energy of a crystal system using quantum mechanical principles. CrySPY and XtalOpt are two widely used algorithms for crystal structure prediction based on the ab initio approach that employ optimization algorithms to search for stable crystal structures with low energies. AGOX employs a machine-learned potential called M3GNet, which is designed to accurately describe the energy landscape of materials, explore the vast configuration space of crystal structures, and identify stable candidates. The Basin-hopping algorithm involves a stochastic exploration of the potential energy landscape, efficiently searching for the global minimum by jumping between these basins (local energy minima). DL-based crystal structure prediction algorithms utilize neural networks and deep learning techniques to learn representations of atomic structures and predict their energies or stability. GN-OA is an algorithm that utilizes machine learning potentials to predict crystal structures. It employs graph networks, which can represent atomic structures efficiently, and atomistic potentials learned from data to optimize the crystal structures. We also compared the performances of the non-DFT-based CSP algorithms with the leading DFT-based CSP algorithms including CALYPSO [19] and USPEX [4].We then analyze and evaluate these CSP algorithms and utilize them to generate target structures. By calculating metrics through our quantitative evaluation method, we can examine the performance of each algorithm.In our paper, we conduct a comprehensive analysis and evaluation of leading crystal structure prediction (CSP) algorithms. To assess their performance, we calculate a set of quantitative metrics for CSP benchmarking, which serve as objective measures to gauge the algorithms’ accuracy, efficiency, and reliability in predicting crystal structures. Through this evaluation process, we aim to provide a clear picture of the capabilities of each algorithm and how they fare in comparison to one another. The results obtained from the benchmarking analysis shed light on their performance in terms of predicting known crystal structures and identifying novel structures not present in the training dataset. Our quantitative evaluation metrics ensure that the analysis is conducted in a fair and unbiased manner, allowing for meaningful comparisons. Overall, this thorough analysis and evaluation are fundamental to advancing the state-of-the-art in CSP and accelerating the discovery of new materials with tailored properties for diverse applications.

Another category of modern CSP algorithms combines global search with machine learning potential functions for structure search. In early attempts, these algorithms mainly used specialized ML potential functions that only covered one or a few element types: in [20], ML potentials for four systems: Al, C, He and Xe were trained for CSP using the USPEX algorithm. A follow-up study [21] used active learning to develop a ML potential and apply it to the CSP of carbon allotropes, sodium structures under pressure, and boron allotropes. CALYPSO algorithm has also been combined with machine learning potentials for structure prediction of Boron (B) clusters [22], 24-atom cubic boron phases [23] and gallium nitride (GaN) phase simulation with 4096-atom [24]. However, all these ML potentials are not universal enough to cover elements of the whole or a majority of the periodic table.

Recent ML potentials have been developed to cover a large portion of the periodic table. Takamoto et al. [25] developed TeaNet, a 16-layer graph convolution network with a residual network (ResNet) architecture and recurrent GCN weights initialization, for the simulation of metals, and amorphous SiO₂ structures. Their universal model can cover 18 elements initially and was later extended to 45 elements [26]. Their neural network potential has been shown to speed up the simulation of lithium diffusion in LiFeSO₄F, molecular adsorption in metal-organic frameworks, an order–disorder transition of Cu-Au alloys, and material discovery for a Fischer-Tropsch catalyst. Choudhary et al. developed a graph neural network ML potential and combined it with a genetic algorithm for crystal structure prediction of alloys [27].

2 Method

2.1 Summary of main category of CSP algorithms

Algorithm	Year	Category	Open-source	URL link	Program Lang
USPEX [4]	2006	De novo (DFT)	No	link	Matlab
CALYPSO [19]	2010	De novo (DFT)	No	link	Python
ParetoCSP [28]	2024	MOGA+MLP	Yes	link	Python
GNOA [9]	2022	BO/PSO + MLP	Yes	link	Python
TCSP [17]	2022	Template	Yes	link	Python
CSPML [29]	2022	Template	Yes	link	Python
GATor [30]	2018	GA + FHI potential	Yes	link	Python
AIRSS [31, 32]	2011	Random + DFT or pair Potential	Yes	link	Fortran
GOFEE [33]	2020	ActiveLearning + Gaussian Pot.	Yes	link	Python
AGOX [13]	2022	Search + Gaussian Potential	Yes	link	Python
GASP [34]	2007	GA + DFT	Yes	link	Java
M3GNet [35]	2022	Relax with MLP	Yes	link	Python
ASLA [36]	2020	NN + RL	No	link	N/A
CrySPY [37]	2023	GA/BO + DFT	Yes	link	Python
XtalOpt [38]	2011	GA + DFT	Yes	link	C++
AlphaCrystal [39, 40]	2023	GA + DL	Yes	link	Python

2.1.1 ab initio CSP

There are several open-source CSP codes based on combining search algorithms with DFT energy calculation, including CrySPY [37], XtalOpt [38], GASP [34] , AIRSS [31, 32]. However, the most widely used and well-established leading software for de novo CSPare GA based USPEX and particle swarm optimization (PSO) based CALYPSO [19]. Due to the computational costs associated with DFT-based CALYPSO, we selected a subset of 23 structures from the test dataset for prediction.Despite their closed-source code, their binary programs can be easily obtained and both come with several advanced search techniques such as symmetry handling, crowding niche, and so on. In this algorithm, structures within the first population are randomly generated while adhering to proper physical constraints, such as the interatomic distances and crystal symmetry. Similar crystal structures are then removed using structure characterization techniques, including bond characterization metrics and coordination characterization function to streamline the search. Once all structures of each population are established, local optimizations are performed using DFT-based methods to locate the local minima. Structural evolution is further carried out using swarm intelligence algorithms such as particle swarm optimization or artificial bee colony. New structures are generated based on the information gathered from the previous generation. By combining random structure generation, local optimizations, and swarm intelligence algorithms, the CALYPSO method efficiently explores the PES, increasing the chances of locating the global energy minimum. In summary, the general idea is to iteratively generate and optimize structures to navigate the complex energy landscape.

Due to the demanding computational costs for DFT calculations, we allocated 3,000 DFT energy calculations in all their experimental runs for different benchmark test samples.Here, the structural relaxations are performed using the Vienna Ab initio simulations package (VASP) [41] by considering the Perdew-Burke-Ernzerhof generalized gradient approximation [42] for the exchange-correlation function and the projector-augmented-wave potentials [43] for the electron-ion interactions.VASP allows for geometry optimization using different optimization algorithms, such as the conjugate gradient method (CG) and quasi-Newton RMM-DIIS algorithm.The VASP running parameters mainly involve the plane-wave cutoff energy (the maximum kinetic energy for the electronic wavefunctions), Monkhorst–Pack k meshes(sampling in the Brillouin zone), energy and force convergence precisions. By gradually optimizing the structure by adjusting these corresponding parameters, the optimization process can be accelerated while still obtaining reliable and accurate results. This approach can help save overall time in structure prediction by efficiently exploring the configuration space and converging toward the optimal structure.The relevant parameter settings for DFT calculations of CALYPSO during the optimization process for its predictions are listed in detail in Table S7 of the supplementary file.

2.1.2 GNOA with ML-potentials

Our approach involves GN-OA algorithms [9], a machine-learning method for crystal structure prediction. In this framework, a graph network (GN) is employed to establish a correlation model between crystal structures, and an optimization algorithm (OA) is utilized to accelerate the search for the crystal structure with the lowest formation enthalpy. In this work, we evaluate the CSP algorithms based on ML potentials. Two graph neural network potentials have been tested here including the MEGNet [44] and M3GNet [35], which has been combined with random search (RAS), Bayesian optimization (BO), and Particle Swarm Optimization (PSO) for crystal structure prediction as implemented in their GNOA package [45].

2.1.3 AGOX with M3GNet potential

We adopt Atomistic Global Optimization X (AGOX)[13], a customizable and efficient global structural optimization framework that has six search global optimization algorithms implemented. AGOX uses the effective medium theory (EMT) potential[46] to optimize and relax the generated candidate structures by default. For better comparison with other algorithms, we replace the simple EMT potential with the more powerful M3GNet[35] inter-atomic potential. M3GNet is based on graph neural networks and explicitly incorporates many-body interactions and is much faster than DFT-based energy calculations[47, 48]. After completing the optimized structure search using each algorithm, the final optimized structure is further relaxed using M3GNet. An overview of the AGOX framework is shown in Figure1. In our work, we use three different search algorithms: Basin-hopping (BH), parallel tempering (PT), and random search (RSS). The global search algorithms are described below:

CSPBench: a benchmark and critical evaluation of Crystal Structure Prediction (1)

Random structure search:

Random structure search (RSS) is the simplest algorithm in AGOX. In each iteration, it generates a candidate at random and optimizes it locally. The relaxed candidate is then stored in the database.

Basin-hopping:

Basin-hopping (BH)[14] is a method of exploring the configuration space by performing a series of jumps from one potential energy surface (PES) minimum to another, turning the potential energy surface into a network of interpenetrating stairs. A sampler is employed to maintain track of a candidate that has already been reviewed and provides information for the creation of a new candidate. The process begins by rattling a prior candidate to produce a new candidate. Next, the generated candidate is relaxed locally. The Metropolis criterion is next examined to decide whether the new candidate is approved as the generation’s beginning point. The probability of acceptance of the Metropolis criterion is defined by the following equation:

A=min\{1,exp[\beta(E_{k-1}-E_{k})]\}

(1)

where $\beta=1/k_{B}T$ with $k_{B}$ the Boltzmann’s constant, and $E_{k}$ is the energy of the structure found in iteration $k$ .

Parallel tempering

Simultaneous basin-hopping searches are conducted across different temperatures in the parallel tempering (PT) method, as described by Kofke et al.[15]. This approach promotes exploration at elevated temperatures and exploitation at lower ones, ensuring that structures adaptable to varying temperatures are swapped to prevent stagnation. In this setup, multiple workers with different processors each perform a basin-hopping search at a specific temperature, utilizing a single database.The following equation calculates the probability of the structure swap between workers with adjacent temperatures every $N_{t}$ episode:

A=min\{1,exp[(\beta_{i}-\beta_{j})(E_{i}-E_{j})]\}

(2)

where $\beta=1/k_{B}T_{i}$ with $k_{B}$ the Boltzmann’s constant.

Local GPR basin-hopping

The basin-hopping search is enhanced by the use of a local Gaussian process regression (GPR) model[49] implemented inside the AGOX framework. The Local GPR model uses a radial basis function (RBF) kernel[50] and the smooth overlap of atomic positions (SOAP)[51] descriptor to perform the basin-hopping search.

Evolutionary algorithm

Biological evolution theories serve as the foundation for evolutionary algorithms (EAs). The first step is to generate a population of potential solutions, and then each one is evaluated using a fitness function to determine how effective it is. With time, the population evolvesand finds better solutions. With each iteration of EA, a population of candidates is maintained and used as input to generate a new candidate. After that, the newly generated candidate is relaxed. The sampler would keep a population of structurally different candidates in an EA so that they may serve as the parents of future candidates. The algorithm presented in [52] is used to select the population.

GOFEE: Bayesian Optimization

GOFEE is a Bayesian search algorithm that Bisbo and Hammer[16] developed as an effective technique for locating low-energy structures in computationally expensive energy landscapes, termed global optimization with first-principles energy expressions. A combination of an evolutionary search strategy and an actively learned surrogate model of the energy space is deployed in GOFEE. This facilitates answering a lot more structural queries than the target potential would allow. However, a significantly smaller number of evaluations are performed utilizing the target potential on the structures that the surrogate model thought were most promising. These evaluations serve as training data to further refine the surrogate model. In GOFEE, a set of candidate structures is first locally optimized using a computationally inexpensive surrogate potential. Subsequently, a lower confidence bound acquisition function selects candidates for evaluation with the true potential. Each episode of GOFEE generates $N$ candidates, which are locally optimized using a GPR potential, or more precisely in the so-called lower-confidence-bound expression, defined by the following equation:

E(x)=\hat{E}(x)-k\sigma(x)

(3)

where $\hat{E}$ and $\sigma$ are the predicted energy and uncertainty of the GPR model for the structure represented by $x$ .

2.1.4 ParetoCSP

ParetoCSP[28] based on the idea of the GN-OA algorithm[9] with two major upgrades including the multi-objective GA search algorithm and the use of M3GNet potential for energy calculation. GN-OA has been proven from previous research that incorporating symmetry constraint expedites CSP. Similar to the GN-OA approach, our method also considers crystal structure prediction with symmetry constraints. We incorporate two additional structural features, namely crystal symmetry $S$ and the occupancy of Wyckoff position $W_{i}$ for each atom $i$ . These features are selected from a collection of $229$ space groups and associated $1506$ Wyckoff positions. The method begins by selecting a symmetry $S$ from the range of $P2$ to $P230$ , followed by generating lattice parameters $L$ within the chosen symmetry. Next, a combination of Wyckoff positions $\{W_{i}\}$ is selected to fulfill the specified number of atoms in the cell. The atomic coordinates $\{R_{i}\}$ are then determined based on the chosen Wyckoff positions $\{W_{i}\}$ and lattice parameters $L$ . To generate crystal structures, we need to tune the $S$ , $\{W_{i}\}$ , $L$ , and $\{R_{i}\}$ variables.

By selecting different combinations of $S$ , ${W_{i}}$ , $L$ , and ${R_{i}}$ , one can generate a comprehensive array of possible crystal structures for the given ${c_{i}}$ . In theory, determining the energy of these various structures and selecting the one with the least energy should be the optimal crystal arrangement. However, exhaustively enumerating all these structures becomes practically infeasible due to the staggering number of potential combinations. To address this complexity, a more practical approach involves iteratively sampling candidate structures from the design space, under the assumption that one of the sampled structures will emerge as the most stable and optimal solution. Consequently, an optimization strategy is adopted to guide this search process towards identifying the structure with the lowest energy. In particular, a genetic algorithm, NSGA-III[53], improved by incorporating AFPO[54] to enhance its performance and robustness, is utilized.

It starts by generating n random crystals and assigning them an age of 1, where n denotes the population size. One completegeneration then goes through the following steps: calculating the energy of the structures and fitness, selecting parents, performinggenetic operations, and updating the age. After a certain threshold of G generations, the lowest energy structure from the multidimensional Pareto front is chosen and further relaxed and symmetrized to obtain the final optimal structure. The genetic encoding isshown in the lower right corner of the flowchart. It contains lattice parameters $a$ , $b$ , $c$ , $\alpha$ , $\beta$ , and $\gamma$ , the space group $S$ , the wyckoffposition combination $W_{i}$ , and the atomic coordinates $R_{i}$ of atom indexed by $i$ .

2.1.5 template-based CSP

CSPML [29] is a machine learning-based crystal structure prediction algorithm that uses metric learning [55] to automate the selection of template structures from a stable structure database with high chemical replaceability to the probable structure for a given chemical composition. For a given formula, CSPML first restricts the candidates to structures with the same compositional ratio and then uses XenonPy [56] to calculate the compositional descriptor of the query formula and templates; only templates ranked as the top five can be considered candidate structures. For the 38 query compositions selected from the Materials Project database [57], 35 out of them have candidates with probabilities greater than 0.5, and 18 out of them have ranked the best template structure that is most similar to the true structure in the top five.

TCSP [17] is a template-based crystal structure prediction algorithm. For a given formula, TCSP first narrows down the candidates to structures with the same prototype and then uses Element’s mover distance (ElMD) [58] to measure the compositional similarity between the query formula and the compositions of all possible template structures. We implement BERTOS [59] in TCSP, which achieves over 96.82% accuracy for all-element oxidation states prediction on the Inorganic Crystal Structure Database (ICSD), to leverage its significant capabilities to enhance the accuracy of predicting oxidation states in the searching template element process of TCSP. Templates with identical oxidation states are then added to the final template list. If no such templates are found, the top five candidate structures are taken as the final templates. We apply the M3GNet potential to optimize generated structures in this work.

2.1.6 DL-based CSP

AlphaCrystal-II [40] is a deep learning based crystal structure prediction algorithm based on the prediction of atomic pairwise distances and distance matrix based coordinate reconstruction [60]. This data-driven CSP algorithm exploits the implicit chemical and geometric rules embedded in existing crystal structures as deposited in material databases such as Materials Project or ICSD: for example, most cations are surrounded by anions. A deep neural network is trained to predict the distance matrix given only the composition, which is then used as the objective target for a gradient free optimization (Nevergrad[61])-based crystal structure reconstruction algorithm to search for the atomic coordinates of the structures. The resulting candidate structures are then fed to the M3GNet-based fast structure relaxer to fine-tune the structures.

2.2 CSPBenchmark test set design

To construct a balanced and effective benchmark dataset for crystal structure prediction, we meticulously considered several key factors contributing to the complexity of this challenge. These factors include the total number of atoms and distinct elements within the compositions, the degree of symmetry as indicated by space groups, the prototype characterized by specific atomic ratios, and the shape and dimensions of the unit cell. Additionally, we accounted for the prevalence of similar compositions within established crystal structure databases to ensure comprehensive representation. We selected a total of 180 crystal structures, named CSP180, from the Materials Project database [57]. These structures are evenly distributed among binary, ternary, and quaternary compounds, ensuring a diverse and representative sample. The selected structures exhibit a wide variety of space groups, with the most prevalent being space group 225, which appears 27 times. Other common space groups include 139, 216, 221, and 194. Regarding the crystal system distribution, most structures belong to the cubic system, followed by the tetragonal and hexagonal systems. There are fewer occurrences of orthorhombic, trigonal, monoclinic systems, and a single instance of the triclinic system. Our selection process aimed to include structures with varying levels of prediction difficulty. TableLABEL:table:dataset presents detailed information on the 36 selected test crystals of binary structures, categorized into three difficulty levels: binary_easy, binary_medium, and binary_hard, with each category containing 12 structures. The criteria for difficulty classification include factors such as space group classification, template-based categorization, and the prototype ratios defining the crystal structures. The 180 crystal structures were chosen to cover a broad range of complexities and to provide a comprehensive benchmark for testing. For example, binary compounds like DyCu and GaCo, which belong to space group 221 and exhibit cubic symmetry, were categorized as binary_easy due to their simpler and more predictable structures. In contrast, more complex structures, such as those with trigonal symmetry or multiple elements with varying oxidation states, were placed in higher difficulty categories. This careful selection ensures that the dataset not only includes easily predictable structures but also those that present significant challenges, thereby testing the robustness and accuracy of prediction algorithms.Additional test structures and their corresponding data are provided in the supplementary file for further reference.

Material id	Pretty formula	Space group	Crystal system	Category
mp-2334	DyCu	221	Cubic	binary_easy
mp-2226	DyPd	221	Cubic	binary_easy
mp-1121	GaCo	221	Cubic	binary_easy
mp-2735	PaO	225	Cubic	binary_easy
mp-1169	ScCu	221	Cubic	binary_easy
mp-30746	YIr	221	Cubic	binary_easy
mp-24658	SmH₂	225	Cubic	binary_easy
mp-20225	CePb₃	221	Cubic	binary_easy
mp-788	Co₂Te₂	194	Hexagonal	binary_easy
mp-20176	DyPb₃	221	Cubic	binary_easy
mp-1231	Cr6Ga₂	223	Cubic	binary_easy
mp-12570	ThB₁₂	225	Cubic	binary_easy
mp-20132	InHg	166	Trigonal	binary_medium
mp-2209	CeGa₂	191	Hexagonal	binary_medium
mp-30497	TbCd₂	191	Hexagonal	binary_medium
mp-30725	YHg₂	191	Hexagonal	binary_medium
mp-2731	TiGa₃	139	Tetragonal	binary_medium
mp-2510	ZrHg	123	Tetragonal	binary_medium
mp-2740	ErCo₅	191	Hexagonal	binary_medium
mp-570875	Ga₄Os₂	70	Orthorhombic	binary_medium
mp-861	Hf4Ni₂	140	Tetragonal	binary_medium
mp-1566	SmFe₅	191	Hexagonal	binary_medium
mp-2387	Th₄Zn₂	140	Tetragonal	binary_medium
mp-1607	YbCu₅	191	Hexagonal	binary_medium
mp-13452	BePd₂	139	Tetragonal	binary_hard
mp-11359	Ga₂Cu	123	Tetragonal	binary_hard
mp-1995	PrC₂	139	Tetragonal	binary_hard
mp-30501	Ti₂Cd	139	Tetragonal	binary_hard
mp-30789	U₂Mo	139	Tetragonal	binary_hard
mp-454	NaGa₄	139	Tetragonal	binary_hard
mp-1827	SrGa₄	139	Tetragonal	binary_hard
mp-2129	Nd₂Ge₄	141	Tetragonal	binary_hard
mp-30682	ZrGa	141	Tetragonal	binary_hard
mp-2128	Sn₈Pd₂	68	Orthorhombic	binary_hard
mp-1208467	Tb₈Al₂	227	Cubic	binary_hard
mp-640079	Mn₉Au₃	123	Tetragonal	binary_hard

2.3 Evaluation procedure and running parameters for different algorithms

We substituted DFT calculations with the M3GNet potential, a graph neural network-based surrogate potential model [35], to compute the energy distance for relaxing both the ground truth structure and the predicted structure. Subsequently, we utilized this model to determine the energy distance between these structures. The running parameters and configuration for all CSP algorithms are shown in Table S6 in the supplementary file.

2.4 Evaluation metrics

Evaluation metrics are essential in materials science research as they quantitatively assess the performance and effectiveness of different materials. Currently, numerous evaluation metrics exist in molecular research, such as RDKit [62] and MOSES [63]. However, in the field of materials informatics, there is no unified standard for evaluating new structures. Recently, we introduced a set of distance metrics for CSP performance comparisons in benchmark studies [64], including M3GNet energy distance, minimal rmse distance, minimal mae distance, rms distance, rms anonymous distance, Sinkhorn distance, Chamfer distance, Hausdorff distance, superpose rmsd distance, edit graph distance, Fingerprint distance, to standardize the training and comparison of material structure generation models. For test structures in the polymorph category, we employ a detailed evaluation approach. We compare the predicted structures with multiple ground truth structures, each representing different polymorphs. As each sample corresponds to multiple ground truth polymorphs, this results in several evaluation metrics for each sample. To identify the most accurate predictions, we select the evaluation metrics associated with the ground truth structure that has the minimum M3GNet energy distance. This method ensures that the selected metrics reflect the closest match to the predicted structure, providing a reliable measure of prediction accuracy. The distance metrics are shown below. Table 3 shows selected distance scores for various test samples generated by the AGOX-pt algorithm.

•
Wyckoff position fraction coordinate RMSE distance
•
Wyckoff Minimal MAE distance
•
M3GNet Energy distance
•
Pymatgen RMS distance
•
Sinkhorn distance
•
Chamfer distance
•
Hausdorff distance
•
Superpose RMS distance
•
CrystalNN Fingerprint distance
•
Edit Graph distance
•
XRD distance
•
OFM distance

In addition to the above quantitative distance metrics, we also used Pymatgen’s StructureMatcher to calculate the success rate of crystal structure prediction by identifying if similar structures exist in the MP database with the following default parameters: ltol=0.2, stol=0.3, angle_tol=5, as used in [65].It is important to note that, unlike previous research, we found that StructureMatcher can incorrectly declare structure identity when two similar structures have different space groups (see Discussion section). Therefore, we interpreted the success rate along with the space group matching rate in our results. These additional evaluations provide a more comprehensive assessment of the generated structures’ performance across different algorithms.

Ranking scores of algorithms:

To evaluate how different performance metrics reflect the actual closeness of the predicted structures to the ground truth structure, we employed quantitative distance matrices of CSP [64] to assess the quality of all structures generated by the algorithms. We adopted a ranking scheme to evaluate candidate CSP algorithms based on the quality of their predicted structure against the ground truth structure. For each test structure, all algorithms are first ranked based on the quality of their predicted structures, i.e., their distances to the ground truth structure. Ranking scores on a 0-100 scale are assigned to the algorithms using a standardized scoring method to ensure fairness in ranking. The ranking scheme is illustrated as follows: for example, if there are five algorithms for comparison, five evenly distributed scores ranging from 100 to 0 are assigned to the five algorithms sorted by their performance from the highest to the lowest. Specifically, the algorithm in the first place receives a score of 100 (reflecting the smallest distance), and the second-placed algorithm earns a score of 75, followed by 50 for the third place, 25 for the fourth place, and 0 for the fifth place. In cases where multiple algorithms produced structures with identical quality/distances, they were assigned the same rank, and scores were averaged according to their rankings. For instance, if the first and second place algorithms tie in the quality of their predicted structures, their scores are set as the average of 100 and 75 [(100 + 75)/2]. Similarly, if all five algorithms have the same performance, their scores are set as the average of the five scores [(100 + 75 + 50 + 25 + 0)/5]. Figure 5shows the ranking scores based on the overall average distances for each algorithm.

Primitive

Formula

M3GNet

Energy

Distance

Wyckoff

RMSE

Sinkhorn

Distance

Chamfer

Distance

Superpose

RMSD

Fingerprint

Distance

XRD

Distance

YHg₂

1.02

0.27

38.60

23.01

13.20

1.87

1.44

ScCu

2.69

0.43

22.04

19.80

10.94

2.35

2.85

K₄Na₂Ga₂P₄

0.22

0.31

92.57

11.88

1.98

1.75

1.13

Re₂O₆

0.21

0.31

71.15

13.76

1.00

2.89

1.73

CrFeCoSi

2.25

0.32

59.11

23.89

14.55

N/A

2.92

Ba₂YRuO₆

1.07

0.31

154.98

23.12

16.07

2.54

1.95

Li₂NiO₂

0.91

0.31

51.95

18.29

10.82

N/A

1.42

PrC₂

1.76

0.35

23.67

14.92

8.06

1.42

1.43

Ge₁₂Rh₃

0.03

0.35

207.02

24.41

15.53

1.84

1.61

MgV₄SnO₁₂

0.39

0.30

181.78

15.46

1.28

1.93

1.35

DyPd

2.46

0.35

16.97

15.52

8.65

2.33

2.37

CeCr₂Si₂C

1.29

0.32

48.70

12.08

1.44

1.69

1.39

Nb₂P₂Se₂

0.64

0.26

38.29

9.25

1.49

1.80

1.34

BePd₂

2.23

0.29

24.23

15.05

8.42

2.47

1.61

SrGa₄

0.56

0.27

44.16

16.13

9.16

N/A

1.38

KLi₆IrO₆

0.55

0.30

162.23

15.92

1.40

2.64

1.23

Hf₄Mn₈

1.66

0.33

185.07

23.21

15.24

2.32

1.58

Fe₂Cu₆SnS₈

0.28

0.34

81.88

7.01

1.82

2.20

2.06

KAs₄IO₆

0.27

0.35

145.19

20.16

13.39

2.17

1.33

Ti₂Cd

2.50

0.39

26.25

14.15

8.65

2.36

1.71

3 Results

3.1 Performance comparison of CSP algorithms over all test structures

We evaluated the performance of 13 CSP algorithms in predicting the structures of 180 test samples: TCSP, CSPML, ParetoCSP, AlphaCrystal-II, GNOA-M3GNet-RAS, GNOA-M3GNet-PSO, GNOA-M3GNet-BO, GNOA-MEGNet-RAS, GNOA-MEGNet-PSO, GNOA-MEGNet-BO, AGOX-rss, AGOX-pt, AGOX-bh.The success rates varied across the different algorithms. TCSP achieved the highest success rate, efficiently predicting all 180 structures, while each of the AGOX algorithms (AGOX-rss, AGOX-pt, and AGOX-bh) successfully predicted structures for 175 out of the 180 samples. ParetoCSP and CSPML exhibited impressive performance, predicting 173 and 158 structures, respectively. However, AlphaCrystal-II was limited to predicting only 121 structures due to its inability to handle structures with more than 12 atoms. The GNOA algorithms showed weaknesses in predicting complex binary, ternary, and quaternary structures, with successful predictions ranging from 30 to 39 out of 180 samples.To analyze and compare the performance of all CSP algorithms across the 180 test structures, we first calculated the StructureMatcher success rate by utilizing StructureMatcher from Pymatgen, with the following default parameters: ltol=0.2, stol=0.3, angle_tol=5 to find out if similar materials already existed in Materials Project database. As shown in Figure 2, we find that two template-based CSP algorithms, TCSP and CSPML, stand out for their high performance in generating structures with similar space groups to the ground truth structures. CSPML and TCSP achieve the best performance with success rates of 46.111% and 42.778%, respectively. In contrast, AlphaCrystal-II and ParetoCSP showed significantly lower performance, with StructureMatcher success rates of 13.333% and 11.111%, respectively. The GNOA algorithms generally underperformed, with success rates ranging from 0.556% to 4.444%. Among the three M3GNet-based GNOA CSP algorithms, it can be found that the GNOA-M3GNet-PSO (particle swarm optimization) is better than BO (bayesian optimization) and RAS (random search) based algorithms, reflecting the importance of the search capability used in the CSP algorithms. We can also find that three MEGNet based GNOA algorithms all perform poorly here due to their MEGNet energy potential with lower accuracy. We also evaluated the performance of three AGOX algorithms using different optimization strategies: Basin Hopping (BH), parallel tempering (PT), and random search (RSS). The AGOX algorithms failed to predict any structures matching those in the MP database.Symmetry prediction performance also plays an important role. We computed the space group match rate for each algorithm, which indicates whether the predicted structure has the same space group number as the ground truth structure. TCSP achieves the best performance with 57.778%, followed by CSPML with a space group match rate of 45.556%. The proficiency of template-based CSP algorithms in predicting crystal structures with identical symmetries may stem from two factors. The first is their adeptness at recognizing highly similar structure templates by using oxidation state and composition-based fingerprint matching. The second reason for their success is the widespread existence of similar crystal structures with identical space groups and crystal systems, making it easier to find a template and use simple elemental substitution to determine their structures. This structural distribution pattern has been exploited by the DeepMind team to help discover more than 380,000 new hypothetical stable materials in their Nature report [66] in 2023. However, it is also observed that both template-based algorithms, CSPML and TCSP, fail to find structures with correct symmetry for at least 98 materials (54%) and 76 materials (42%) respectively, reflecting the dire demand for developing de novo CSP prediction algorithms.Next, we found that the space group match rate and StructureMatcher success rate are consistent over the remaining algorithms. Out of the 11 de novo CSP algorithms, ParetoCSP and AlphaCrystal-II outperform all other algorithms, the space group match rates are 12.222% and 12.778% for ParetoCSP and AlphaCrystal-II, respectively. These are 214.27% and 228.57% better than the GNOA-M3GNet-PSO algorithm, the best of the remaining 9 de novo algorithms. These successes can be attributed to ParetoCSP’s strong global search capability based on the age-fitness multi-objective genetic algorithm along with its usage of the M3GNet deep learning potential model [35], and contact-map based deep learning CSP algorithm AlphaCrystal-II utilizes inter-atomic interaction patterns found in existing known crystal structures.GNOA algorithms exhibited low space group match rates, ranging from 0.556% to 12.778%. All AGOX algorithms had a space group match rate of 0.556%, indicating their inability to find structures with the same space group number as the ground truth structures.Overall, we find that current de novo algorithms based on machine learning potentials can only achieve moderate crystal structure performance in terms of their success rate and space group prediction accuracy, indicating the significant potential for further development in this research area.More details on how many of the predicted structures by each algorithm have the same symmetry with the ground truth structures in terms of their space group and crystal system are shown in Table S1 of the supplementary file.

CSPBench: a benchmark and critical evaluation of Crystal Structure Prediction (2)

To conduct a comprehensive analysis and comparison of each algorithm’s performance, we further evaluated the algorithms using a set of quantitative metrics proposed in our work [64]. First, we used the formation energy distances of the predicted structures compared to the ground truth as the performance metric for algorithm comparison, a method widely used in previous CSP work [4, 67, 22]. Using formation energy as a performance metric has a unique value as it can serve as a critical indicator of a crystal’s stability in nature. To efficiently analyze and validate the performance of structures generated by various algorithms, we computed the ranking score for each algorithm based on the M3GNet formation energy distance of each structure.

CSPBench: a benchmark and critical evaluation of Crystal Structure Prediction (3)

As shown in Figure 3, TCSP achieves the best performance in terms of the average M3GNet energy distance, with a score of 81.99. Close behind, ParetoCSP and CSPML achieve ranking scores of 79.87 and 78.91, respectively. The high ranking scores for TCSP and CSPML based on M3GNet energy distance reflect these two template-based CSP algorithms’ consistency in symmetry prediction performance. ParetoCSP’s ranking score of 79.87 is slightly below TCSP yet considerably higher than the scores of all other 11 de novo CSP algorithms. The three AGOX algorithms achieve ranking scores of 62.22, 64.15 and 63.59, respectively. AlphaCrystal-II achieves a ranking score of 58.46, slightly lower than those of the AGOX algorithms. Despite the AGOX algorithms having the lowest StructureMatcher success rate and space group match rate, their higher M3GNet energy distance ranking scores compared to AlphaCrystal-II can be attributed to the larger number of successfully generated structures by the AGOX algorithms. Furthermore, AlphaCrystal-II demonstrates its capability by generating a larger total number of predicted structures compared to all GNOA algorithms, it surpasses the performance of GNOA-MEGNet-PSO by an astounding 830.89% and GNOA-M3GNet-PSO by at least 297.69%. The significantly lower ranking scores for all GNOA algorithms are primarily due to their limited ability to generate structures and their weakness in predicting structures similar to the ground truth, as reflected by the small portion of successfully generated structures.

CSPBench: a benchmark and critical evaluation of Crystal Structure Prediction (4)

CSPBench: a benchmark and critical evaluation of Crystal Structure Prediction (5)

We further utilized the average Chamfer distance as a metric, which is calculated as the mean of the squared distances between each atomic site in one structure and its nearest neighbor in another, and serves as a robust measure of structural congruence. By capturing the spatial correlation between atomic sites, the Chamfer distance offers a comprehensive and nuanced measure of similarity. This sensitivity to the intricate details of crystallographic arrangements allows for a nuanced evaluation of model performance across a diverse range of structures. The ranking scores based on the average Chamfer distances are depicted in Figure 4.Among the algorithms evaluated, TCSP emerges as the top performer with the highest ranking score of 89.04, closely followed by CSPML with a score of 77.05. These template-based CSP algorithms excel in generating structures that are more congruent with the ground truth structures, demonstrating their strong predictive capabilities. ParetoCSP, with a ranking score of 77.99, also exhibits good performance by the age-fitness Pareto genetic algorithm. The AGOX family of algorithms, including AGOX-bh, AGOX-pt, and AGOX-rss, show relatively consistent performance, with scores ranging from 56.58 to 58.18. AlphaCrystal-II achieves a comparable ranking score of 53.89.The performances of ParetoCSP, the AGOX family, and AlphaCrystal-II significantly outperform the GNOA family of algorithms. AGOX-rss outperforms GNOA-M3GNet-BO by a substantial 293.90%. While GNOA algorithms demonstrate superior performance for simpler binary structures, their predictive accuracy diminishes for more complex structures. This is evident in their lower average ranking scores on the Chamfer distance metric, ranging from 10.90 to 15.45, highlighting the challenge of accurately predicting complex crystal structures.Common limitations of current methods include their dependency on template-based approaches and difficulties in predicting structures with complex symmetries and compositions. These constraints necessitate the development of more advanced, de novo CSP prediction algorithms.Overall, the ranking scores based on the average Chamfer distances provide a meaningful evaluation of the algorithms’ ability to predict crystal structures that closely match the ground truth structures, with template-based approaches generally outperforming other methods for the given set of structures.Finally, in our endeavor to thoroughly and accurately assess the performance of each algorithm, we conducted a comprehensive analysis by computing overall ranking scores based on all 12 distance metrics. As depicted in Figure 5, CSPML maintains its dominance with a score of 71.27, followed closely by the TCSP algorithm at 66.83. CSPML outperforms the TCSP algorithm due to its utilization of chemical composition descriptors and crystal structure descriptors. This comprehensive approach results in a higher ranking score for CSPML.The AGOX family achieved scores of 55.25, 54.84, and 54.59, respectively, while ParetoCSP had a ranked score of 57.40. In contrast, AlphaCrystal-II received a slightly lower score of 48.34, and the GNOA family ranged from 8.27 to 12.41. These results underscore the varied capabilities of the algorithms, not only affirming the importance of integrating diverse descriptors for enhancing predictive accuracy but also highlighting the challenges and potential areas for improvement in crystal structure prediction algorithms.

3.2 Performance comparison over binary structures

CSPBench: a benchmark and critical evaluation of Crystal Structure Prediction (6)

To further analyze the performance of various algorithms, we focus on their predictions across 60 binary structures, employing the M3GNet energy distance and Hausdorff distance as key metrics. Hausdorff distance is a structure similarity metric that represents the maximum deviation between two structures which has the advantage of being invariant to rigid transformations, such as translations, rotations, and reflections. The average ranking scores derived from both metrics for these binary test structures are detailed in Figure 6, showcasing the average M3GNet energy distances and Hausdorff distances in comparison to the ground truth structures for all predictions. Among the algorithms, ParetoCSP achieved the highest ranking scores, with 83.66 for the M3GNet energy distance metric and 81.41 for the Hausdorff distance. TCSP also performed well, with a ranking score of 81.03 for the M3GNet energy distance and 82.69 for the Hausdorff distance, indicating strong performance in both metrics. CSPML, while slightly lower than TCSP and ParetoCSP, still showed robust performance with ranking scores of 74.81 and 72.18 for the M3GNet energy distance and Hausdorff distance, respectively. The AlphaCrystal-II and AGOX algorithms (AGOX-bh, AGOX-pt, and AGOX-rss) demonstrated relatively good performance, with ranking scores ranging from 54.94 to 60.00 for the M3GNet energy distance and 51.67 to 55.51 for the Hausdorff distance. Additionally, the GNOA algorithms with the M3GNet potential (GNOA-M3GNet-RAS, GNOA-M3GNet-PSO, and GNOA-M3GNet-BO) generally achieved higher ranking scores based on both the M3GNet energy distance and Hausdorff distance metrics compared to the GNOA algorithms with the MEGNet potential. However, the GNOA algorithms still struggled with predicting more complex structures, even within the binary structures category, especially those that do not consist of a simple 1:1 ratio of atoms. Their ranking scores for the Hausdorff distance ranged from 16.41 to 25.28, highlighting the need for further improvements to better handle more intricate crystal structures. Overall, despite the relatively simpler nature of binary structures, some algorithms still faced challenges in accurately predicting their configurations. This underscores the importance of continuous improvement and the development of more robust predictive models.

3.3 Performance comparison over ternary structures

CSPBench: a benchmark and critical evaluation of Crystal Structure Prediction (7)

Given the relative simplicity of binary structures in our dataset, we extend our analysis to ternary and quaternary crystal structures to evaluate the strengths and weaknesses of various CSP algorithms across different structural complexities.We compared the algorithm performances for ternary structures using the M3GNet energy distance and the Hausdorff distance based ranking scores.As shown in Figure 7, CSPML and ParetoCSP achieve the highest ranking scores in terms of M3GNet energy distance of 87.76 and 82.76, respectively. Meanwhile, TCSP attains the highest ranking scores for the Hausdorff distance, recording 89.17, with a closely comparable score of 80.45 on the M3GNet energy distance. Similar to their performance on binary structures, the AGOX algorithms exhibited lower scores, ranging from 60.77 to 61.60 on the M3GNet energy distance and from 57.44 to 58.27 on the Hausdorff distance. AlphaCrystal-II achieved ranking scores of 62.05 on the M3GNet energy distance and 56.41 on the Hausdorff distance. However, the GNOA algorithms faced significant limitations in predicting more complex structures, resulting in low ranking scores on both distance metrics, ranging from 3.59 to 14.42. Comparing the ranking scores of different algorithms in Figure 7 to those for binary test structures (Figure 6), we find that the template-based algorithms and ParetoCSP maintain similar ranking scores. In contrast, the scores for the AGOX algorithms and AlphaCrystal-II increase, indicating improved performance with more complex structures. However, these improvements come at the cost of the ranking scores of the GNOA family of algorithms. This performance gap clearly indicates the necessity for further development and refinement within the GNOA algorithms to enhance their predictive accuracy and reliability in handling complex crystal structures.

3.4 Performance comparison over quarternary structures

We evaluated the ranking scores for quaternary structure predictions across all CSP algorithms using M3GNet energy distance and Hausdorff distances as metrics. Figure 8 illustrates the ranking scores for each algorithm. shows the ranking scores for each algorithm. Using the Hausdorff distance as a metric, the TCSP algorithm achieved the highest ranking score of 91.73, followed by CSPML and ParetoCSP with the scores of 71.86 and 67.44. Scores for the AGOX family are competitive, with 66.03, 65.51, and 65.38, respectively. On the other hand, ParetoCSP presents a challenge in addressing quaternary structures, reflected by its slightly lower score compared to the ranking scores for binary (Figure 6) and ternary structures (Figure 7). The GNOA algorithms, which encompass six distinct approaches, record scores ranging from 4.10 to 12.44, indicating significant difficulties in predicting quaternary compounds. When evaluated based on the M3GNet energy distance, TCSP again leads with the highest ranking score of 84.49. CSPML, ParetoCSP, and AGOX algorithms also demonstrate competitive performance with scores from 67.24 to 74.17. Among the algorithms assessed, seven others outperformed the GNOA algorithms, reflecting the inherent challenges in predicting quaternary compounds.This comprehensive analysis highlights the strengths and limitations of different CSP algorithms across varying structural complexities, emphasizing the need for continuous improvement and refinement to handle more complex crystal structures effectively.To provide a comprehensive analysis, additional performance comparisons utilizing the Sinkhorn distance, superpose RMSD, Wyckoff RMSE, XRD distance, and OFM distance across binary, ternary, and quaternary test structures, as detailed in the Figure S2, S3 and S4 in the supplementary file.

CSPBench: a benchmark and critical evaluation of Crystal Structure Prediction (8)

Algorithm

CALYPSO

CSPML

ParetoCSP

AGOX-pt

primitive formula

mp-id

Ca₃SnO

mp-29241

0.002

2.413

0.001

0.021

0.001

0.023

1.099

9.715

Co₂Ni₂Sn₂

mp-20237

0.061

5.489

0.000

2.557

0.002

0.056

1.210

15.062

Co₂Te₂

mp-788

0.028

6.520

0.220

2.475

0.050

4.573

0.879

20.100

Cr₆Ga₂

mp-1231

2.016

7.001

0.096

5.710

0.015

1.622

1.494

6.864

Hf₄Mn₈

mp-11449

0.002

6.383

0.129

8.715

0.266

6.457

1.660

15.644

Hf₄Ni₂

mp-861

0.014

4.064

1.274

11.162

0.039

7.752

1.823

11.395

HfCo₂Sn

mp-20730

0.054

3.928

0.002

0.046

0.038

9.083

2.175

16.670

InHg

mp-20132

0.012

10.296

0.015

7.968

0.069

7.379

0.191

12.479

Li₂CuSn

mp-30591

0.004

3.933

0.111

0.129

0.012

8.105

0.818

16.551

LiMg₂Ga

mp-30648

0.031

7.062

0.000

2.892

0.032

9.908

0.773

19.834

MgCu₄Sn

mp-3676

0.006

3.194

0.167

5.256

0.085

3.881

0.942

16.160

MgInCu₄

mp-30587

0.070

4.861

0.010

1.704

0.079

5.029

0.986

20.072

NaGa₄

mp-454

0.021

2.473

0.388

5.206

0.009

1.236

0.297

9.522

ScCu

mp-1169

0.004

1.701

0.108

3.681

0.000

0.006

2.694

11.775

SrGa₄

mp-1827

0.003

2.685

0.722

6.777

0.009

2.229

0.565

10.163

SrGaCu₂

mp-30580

0.000

8.402

0.196

4.749

0.075

4.853

1.026

15.771

Ti₂Cd

mp-30501

0.041

3.755

0.061

1.064

0.010

5.197

2.497

8.648

TiGa3

mp-2731

0.023

2.348

0.006

8.246

0.002

0.221

1.296

11.730

Y₃Al₉

mp-2451

mp-11231

0.001

0.011

0.002

3.022

0.001

3.723

0.893

21.138

YHg₂

mp-30725

0.001

1.747

0.006

0.044

0.008

1.741

1.025

13.130

Zn₂C₂O₆

mp-9812

0.054

10.398

0.008

3.995

0.888

11.537

0.679

13.052

ZnCdPt₂

mp-30493

0.008

0.134

0.086

8.328

0.010

2.038

1.212

10.743

ZrHg

mp-2510

0.010

4.172

0.004

0.463

0.016

4.428

1.719

7.787

# of the best

CSPBench: a benchmark and critical evaluation of Crystal Structure Prediction (9)

3.5 Performance comparison of non-DFT based CSP algorithms against DFT-based CALYPSO

Due to the extremely demanding computational resources, it is not feasible to evaluate the DFT-based CALYPSO algorithm over all 180 test samples. Most test samples are too complex for CALYPSO to predict their structures accurately. Therefore, we selected a subset that includes 13 binary structures and 10 ternary structures for evaluating the DFT-based algorithm and compared its performance with those of non-DFT based CSP algorithms. The test set includes NaGa₄, Ti₂Cd, Y₃Al₉, ZrHg, YHg₂, TiGa₃, SrGa₄, ScCu, InHg, Hf₄Mn₈, Hf₄Ni₂, Cr₆Ga₂, Co₂Te₂, Zn₂C₂O₆, Ca₃SnO, ZnCdPt₂, SrGaCu₂, MgInCu₄, MgCu₄Sn, LiMg₂Ga, Li₂CuSn, HfCo₂Sn, Co₂Ni₂Sn₂. We chose M3GNet energy distance (ED) and Hausdorff distance (HD) as the evaluation metrics to compare the performances of CALYOSO, CSPML, ParetoCSP, and AGOX-pt. Notably, two structures Hf₄Mn₈ and Y₃Al₉ are categorized by polymorphy; therefore, there are two ground truth structures (mp-2451 and mp-11231) for Y₃Al₉.The comparison results are shown in Table 4. We find that CALYPSO achieves the lowest M3GNet energy distances for 12 out of 23 test samples, ranging from 0.000 to 0.070 (eV/atom). This includes 8 binary structures and 4 ternary structures. It also records the smallest Hausdorff distance for 5 out of 23 test samples. This demonstrates the superiority of the de novo CSP algorithm with DFT energy calculation for this small scale test set and its ability to find lower energy structures using DFT energy calculation. However, non-DFT based algorithms like CSPML and ParetoCSP show competitive performance as well, both achieving the lowest energy distance for 7 test samples. CSPML achieves the best performance with the lowest HD distance for 11 out of 23 samples, reflecting the effectiveness of the template-based CSP algorithm in identifying ground truth structures by finding similar template structures. Although CALYPSO has better performance on ED by utilizing DFT calculations to find structures with lower energies, CSPML achieves high performance in both ED and HD for many ternary structures, such as Ca₃SnO, Co₂Ni₂Sn₂, HfCo₂Sn, LiMg₂Ga, MgInCu₄, Zn₂C₂O₆.Among the remaining two de novo CSP algorithms, ParetoCSP significantly outperformed AGOX-pt, achieving the best ED and HD for 7 out of 23 test samples each. In contrast, AGOX-pt did not achieve the best score in any of the test samples for ED and HD. All the HD scores of AGOX-pt are large, ranging from 6.864 Å to 21.138 Å, indicating its weakness in predicting structures with similar geometry to the ground truth.

The CSP performance comparison results need to be interpreted holistically. Good performance with a single metric can be misleading as shown in Figure 9, we calculate three types of success rates for CSP prediction including StructureMatcher success rate, Space group match rate, and consensus match rate (both StructureMatcher and space groups need to be matched between predicted structures and the ground truths).CALYPSO achieves the best performance of 43.478% on StructureMatcher success rate. However, its space group match rate is lower than those of CSPML and ParetoCSP, leading to its relatively low consensus success rate of 17.391%. In contrast, the consensus success rates of CSPML and ParetoCSP are 26.087% and 21.739%, respectively, which can be attributed to their higher space group match rates. Overall, the template-based CSP algorithm CSPML shows the best performance while the de novo CSP algorithm ParetoCSP achieves the competitive performance in terms of the consensus success rate. The poor performance for AGOX-pt shows its inability to accurately predict the structures for given compositions. It should be noted that space group determination is dependent on the parameter setting adopted, which may change the space group success rate results in Figure 9. Here the default parameters of the space group analyzer of Pymatgen are used. We also noted that the predicted structures with incorrect space groups may be fine-tuned into ones with correct space groups using DFT-based relaxation procedures.

3.6 Case studies

Our benchmark results have demonstrated the limited prediction capability of current computational CSP algorithms, including both template-based and de novo algorithms. Especially, most de novo CSP algorithms cannot accurately predict the space groups for the majority of test samples. To further understand the success and failure cases of different CSP algorithms and how the performance metric scores correlate with the predicted structures, we present three case studies for ErCo₅, Ca₃SnO, KAsIO₆. Additionally, two more case studies, ZnCdPt₂ and Ga₂Cu, are provided in the supplementary file Figure S5, S6 and Table S8, S9 for further reference.

First, we compared the prediction structures of seven algorithms for ErCo₅ as shown in Figure 10 and Table 5. Out of the seven algorithms, two achieved successful predictions: ParetoCSP (Figure 10(b)) and TCSP (Figure 10(c)). These predicted structures closely match the ground truth structure of ErCo₅ obtained from the Material Project database, as reflected by the small distance scores in Table 5 (first two rows). The structures predicted by AlphaCrystal-II (Figure 10(d)) and GNOA-M3GNet-PSO (Figure 10(f)) are close to the target structure and have a formation energy distance of 0.000 eV/atom. However, their Sinkhorn distance, Chamfer distance and superpose rmsd are much higher than those structures predicted by ParentoCSP and TCSP, highlighting the importance to interpret the structure similarity using a comprehensive set of criteria. On the other hand, the structures predicted by CSPML (Figure 10(e)), GNOA-MEGNet-RAS (Figure 10(g)) and AGOX-rss (Figure 10(h)) have high energy distance score as well as other distance metrics values, showing that a large energy distance score can be used as an indicator of low quality predictions. Notably, AGOX-rss displays much higher distance scores across all metrics compared to the other algorithms, indicating its poor performance in predicting the structure of ErCo₅. This case, along with the benchmark studies, underscores the importance of using a set of quantitative criteria to comprehensively evaluate the performance of CSP algorithms.

CSPBench: a benchmark and critical evaluation of Crystal Structure Prediction (10)

Algorithm

M3gnet

Energy

Distance

Sinkhorn

Distance

Chamfer

Distance

Superpose

RMSD

Fingerprint

Distance

XRD

Distance

OFM

Distance

ParetoCSP

0.000

4.713

0.814

0.838

0.011

0.163

0.007

TCSP

0.000

4.744

0.813

0.838

0.007

0.170

0.006

AlphaCrystal-II

0.000

9.036

3.012

1.178

0.016

0.240

0.011

CSPML

0.097

24.634

5.270

1.791

1.492

2.060

0.350

GNOA-M3GNet-PSO

0.000

9.037

3.012

1.178

0.008

0.138

0.006

GNOA-MEGNet-RAS

2.215

15.440

4.189

1.694

2.148

1.686

1.831

AGOX-rss

1.880

85.596

26.181

14.652

1.950

1.650

2.282

Next, we choose the ternary structure to compare the performance for DFT-based CSP algorithm CALYPSO with two template-based CSP algorithms, CSPML and TCSP. Figure 11 shows that the structure of Ca₃SnO predicted by the CSPML is more similar than those predicted by CALYPSO and TCSP. We find that the formation energy distances are small for CSPML, CALYPSO and TCSP. However, CSPML has much lower geometric distances, including a Sinkhorn distance of 0.071 Å, a Chamfer distance of 0.029 Å and a superpose RMSD of 0.010 Å. Additionally, its fingerprint score of 0.000 is much better than those predicted by CALYPSO and TCSP, which have scores of 0.116 and 1.172, respectively, indicating the consistency of the different types of CSP metrics used in this study. This consistency also applies to the OFM performance metric. However, it is recognized that the XRD distance of the prediction by CSPML is worse than the XRD score of the prediction by CALYPSO, indicating that the XRD distance alone is not a reliable metric for CSP prediction evaluation.

CSPBench: a benchmark and critical evaluation of Crystal Structure Prediction (11)

Algorithm

M3gnet

Energy

Distance

Sinkhorn

Distance

Chamfer

Distance

Superpose

RMSD

Fingerprint

Distance

XRD

Distance

OFM

Distance

CSPML

0.001

0.071

0.029

0.010

0.000

0.992

0.007

CALPSO

0.002

7.946

2.899

1.614

0.116

0.943

0.025

TCSP

0.007

80.852

5.119

1.125

1.172

1.052

0.176

To further understand the advantages of different algorithms, we examined the case of quaternary structure prediction for KAs₄IO₆Cu, in which the template-based methods work well while the de novo methods fail (Figure 12. Both CSPML and TCSP produced reasonable structures but CSPML’s prediction overall is better despite TCSP’s result having a slightly lower XRD distance value (1.039 compared to 1.040) (See Table 7). In contrast, the structure predicted by ParetoCSP is significantly worse across all performance metrics. The much higher Sinkhorn distance of 207.028 Å, a Chamfer distance of 23.239 Å, and a superpose RMSD of 20.014 Å indicate poor geometric similarity of the predicted structure by ParetoCSP to the ground truth structure (Figure LABEL:fig(d)).

CSPBench: a benchmark and critical evaluation of Crystal Structure Prediction (12)

Algorithm

M3GNet

Energy

Distance

Sinkhorn

Distance

Chamfer

Distance

Superpose

RMSD

fingerPrint

Distance

XRD

Distance

OFM

Distance

TCSP

0.000

1.403

0.234

0.037

0.143

1.039

0.025

CSPML

0.000

1.403

0.234

0.037

0.143

1.040

0.025

ParetoCSP

0.705

207.028

23.239

20.014

2.254

2.529

0.521

4 Discussion

Objective and accurate evaluation and comparison of different CSP algorithms are nontrivial due to the complexity of structure comparison and the inherent symmetry of crystal structures plus the possible polymorphism of a given test structure. Here we show several aspects that need special attention to accurately interpret the evaluation results and issues that may arise during CSP algorithm performance evaluation.

Polymorphism test samples:

In our benchmark set, there are several test structures that have alternative structures with the same composition due to structural polymorphism. For these test structures, the predicted structure of a CSP algorithm is compared to each of the polymorphism ground truth structures and the one with the smallest distance is selected to calculate the distance error. Note that in this benchmark study, we only consider the top-1 prediction performance.

Ranking score bias:

We would like to point out that we need to cautiously interpret the rankings of different CSP algorithms sorted by our ranking scores as shown in Figure 3. For example, the AGOX series algorithms have shown better rankings than the GNOA series of algorithms in Figure 3 to Figure 8 while in Figure 2, the AGOX algorithms have zero success rate according to the StructureMatcher criterion while GNOA algorithms have successfully predicted several test structures. This discrepancy is due to the fact that our ranking score penalizes those algorithms that cannot predict any valid structures. In our case here, the GNOA cannot find any valid structures for quite many test structures, leading to their low ranks.

Cautious interpretation of StructureMatcher results:

Several studies have used the Pymatgen’s StructureMatcher to check if two structures are identical [68] and [65]. In the CDVAE experiments, two structures are deemed as identical if StructureMatcher returns true with the following parameters ltol=0.3, stol=0.5, angle_tol=10. In a recent study [65], a more stringen parameter settings of ltol=0.2, stol=0.3, angle_tol=5 are used to check if two structures are identical. However, we find that there are quite many cases StructureMatcher reports two structures to be identical while their space group numbers are not even equal (See Supplementary Table S10 and S11 for examples). It is thus critical to check the space groups even the StructureMatcher reports as identical despite that the space group determination is also based on a given set of parameters (usually using the default values).

CrystalNN Fingerprint dependent structure similarity:

Another structure fingerprint distance (CrystalNNFingerprint) based structure identity checking method was also used in a CSP study [69]. To calculate the similarity between two structures i and j, the given structures are first encoded into a vector-type structural descriptors with their local coordination information (site fingerprint) from all sites. Then, the structure similarity $\tau$ was calculated as the Euclidean distance between the crystal structure descriptors. Structures with dissimilarity $\tau\leq 0.2$ were treated as similar structures in their study [69]. However we find this measurement has limitations, as structure pairs with fingerprint distance less than the threshold 0.2 can also have different space groups (See Supplementary Figure S8 and Table S12) and a space group check with a set of specific parameters or default parameters is needed, similar to the case of using Pymatgen StructureMatcher.

5 Conclusion

Crystal structure prediction plays a crucial role in discovering novel function materials. However, conventional first principle based CSP algorithms currently have limited capability to predict complex crystal structures. Here we conduct a comprehensive benchmark study of 13 CSP algorithms covering template-based, ML potential based, contact map based, and DFT-based CSP algorithms, aiming to illustrate the potential and performance gaps of modern ML potential and deep learning based CSP as well as the widely used template-based CSPs in terms of scalability and accuracy. The algorithms are evaluated over 180 well-selected test set comprising of binary, ternary, and quarternary crystal structures with diverse symmetries and the numbers of atoms in unit cells. All the algorithm performances are calculated using a set of quantitative metrics along with their relative ranking scores over 180 test structures, making it possible to achieve relatively objective performance comparisons.

Our extensive benchmark experiments have uncovered several performance trends and factors that contribute to better CSP performances. First, we find that template-based CSP algorithms can achieve strong performance when a suitable template structure can be found, which is due to the ubiquitous existence of typical structural prototypes [18, 70] and the wide applications of such elemental substitution CSP algorithms in discovering a large number of hypothetical materials [35, 66]. However, such template-based CSP algorithms cannot be used to discover materials with novel structural prototypes. Next, it is observed that the machine learning potentials based CSP algorithms have made significant progress in the past few years, leading to competitive algorithms for CSP, especially for those without good templates. For these algorithms, their performances strongly depend on the global search capability of the search algorithms and the quality of the ML potentials. For example, with the same M3GNet potential, the ParetoCSP is better than the AGOX algorithms and also outperforms GNOA algorithms in terms of quantitative metrics due to its enhanced search capability. A further comparison of the ML potential based CSP algorithms with the DFT-based CALYPSO shows that the former class of modern CSP algorithms have demonstrated strong performance and outperform the later one for most of the test samples. Even for the relatively simple crystal structure test sets, the template-based algorithms and ML-potential based CSP algorithms both showed better performance than DFT-based CALYPSO in terms of the formation energy distance and Hausdorff distance. However, our benchmark results also showed all current de novo CSP algorithms (non-template methods) are still in an early stage of development: most of them cannot even accurately predict the space group or crystal systems for a majority of the 180 test samples. Our evaluation of DFT-based CALYPSO also showed the lack of scalability for such DFT-based de novo CSP algorithms as it is almost infeasible to complete predictions for all 180 test samples (It should noted that modern CALYPSO has also incorporated the ML potential for more scable CSP). However, DFT-based de novo CSP algorithms have their unique advantage in predicting crystal structures within special conditions such as high-pressure for which there is currently no ML potential for such condition. It should be noted that due to the subjectivity of selecting the 180 test samples, our evaluation has inherent bias despite our effort in trying to cover diverse structures, symmetries, and structural prototypes. So the rankings of different algorithms should not be used to judge the superiority of any algorithm but just should be used to guide the application of appropriate algorithms based on the application scenario. For example, for a given composition, it is reasonable to predict its structure first by using the template-based algorithms such as CSPML and ML potential based de novo algorithms such as ParetoCSP or GNOA and check their formation energy, E-above-hull energy, and mechanical stability. If still not satisfactory, one can then try the DFT-based de novo methods such as CALYPSO if the composition is not too complex.

Overall, our benchmark has demonstrated the significant progress of machine learning potential based CSP algorithms and its promising prospect to achieve scalable CSP due to the emergence of better search algorithms and modern machine learning potentials. Our benchmark data and quantitative performance evaluation metrics, the open-sourced codes of such CSP algorithms and their independence of DFT calculations thus paved the way to allow researchers from a wide variety of researchers from the communities of AI, data science, statistics to explore this promising and significant CSP problem.

6 Data and Code Availability

The 180 test structures are obtained from the Materials Project database. Their mp-ids are available from our Github repository https://github.com/usccolumbia/cspbenchmark.The code for calculating ranking scores can be also downloaded from the Github repository. The performance metrics calculation is done using the code from the CSPBenchMetrics repository https://github.com/usccolumbia/CSPBenchMetrics. The open source CSP codes are available from their corresponding websites as shown in Table 1. in the main text. We have modified AGOX and GNOA to integrate them with the neural network potential model M3GNet.

7 Contribution

Conceptualization, J.H.; methodology,J.H. L. W., S. O., R.D., N.F., Y.S.,E.S., M.X. ; software, L.W., S.O.,Y.S.; resources, J.H.; writing–original draft preparation, J.H., L.W., S.O., N.F., R.D.,E.S.,M.X.; writing–review and editing, J.H., R.D., S.O., N.F.; visualization, L.W.,S.O.; supervision, J.H.; funding acquisition, J.H.

Acknowledgement

We would like to thank the helpful discussion and suggestions of Prof. Yanchao Wang of Jilin University. The research reported in this work was supported in part by National Science Foundation under the grant and 2110033, OAC-2311203, and 2320292. The views, perspectives, and content do not necessarily represent the official views of the NSF.

References

[1]Andriy Kryshtafovych, Torsten Schwede, Maya Topf, Krzysztof Fidelis, and JohnMoult.Critical assessment of methods of protein structure prediction(casp)—round xiv.Proteins: Structure, Function, and Bioinformatics,89(12):1607–1617, 2021.
[2]John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov,Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, AugustinŽídek, Anna Potapenko, etal.Highly accurate protein structure prediction with alphafold.Nature, 596(7873):583–589, 2021.
[3]DavidH Bowskill, IsaacJ Sugden, Stefanos Konstantinopoulos, ClaireS Adjiman,and ConstantinosC Pantelides.Crystal structure prediction methods for organic molecules: State ofthe art.Annual Review of Chemical and Biomolecular Engineering,12:593–623, 2021.
[4]ColinW Glass, ArtemR Oganov, and Nikolaus Hansen.Uspex—evolutionary crystal structure prediction.Computer physics communications, 175(11-12):713–720, 2006.
[5]Giancarlo Trimarchi, ArthurJ Freeman, and Alex Zunger.Predicting stable stoichiometries of compounds via evolutionaryglobal space-group optimization.Physical Review B, 80(9):092101, 2009.
[6]Xiangyang Liu, Haiyang Niu, and ArtemR Oganov.Copex: co-evolutionary crystal structure prediction algorithm forcomplex systems.npj Computational Materials, 7(1):199, 2021.
[7]DetlefWM Hofmann and Joannis Apostolakis.Crystal structure prediction by data mining.Journal of Molecular Structure, 647(1-3):17–39, 2003.
[8]ChristopherC Fischer, KevinJ Tibbetts, Dane Morgan, and Gerbrand Ceder.Predicting crystal structure by merging data mining with quantummechanics.Nature materials, 5(8):641–646, 2006.
[9]Guanjian Cheng, Xin-Gao Gong, and Wan-Jian Yin.Crystal structure prediction by combining graph network andoptimization algorithm.Nature communications, 13(1):1492, 2022.
[10]AntonO Oliynyk, LawrenceA Adutwum, BrentW Rudyk, Harshil Pisavadia, SogolLotfi, Viktor Hlukhyy, JamesJ Harynuk, Arthur Mar, and Jakoah Brgoch.Disentangling structural confusion through machine learning:structure prediction and polymorphism of equiatomic ternary phases abc.Journal of the American Chemical Society, 139(49):17870–17881,2017.
[11]Yanchao Wang, Jian Lv, LiZhu, and Yanming Ma.Crystal structure prediction via particle-swarm optimization.Physical Review B, 82(9):094116, 2010.
[12]DavidC Lonie and Eva Zurek.Xtalopt: An open-source evolutionary algorithm for crystal structureprediction.Computer Physics Communications, 182(2):372–387, 2011.
[13]Mads-PeterV Christiansen, Nikolaj Rønne, and Bjørk Hammer.Atomistic global optimization x: A python package for optimization ofatomistic structures.The Journal of Chemical Physics, 157(5):054701, 2022.
[14]DavidJ Wales and JonathanPK Doye.Global optimization by basin-hopping and the lowest energy structuresof lennard-jones clusters containing up to 110 atoms.The Journal of Physical Chemistry A, 101(28):5111–5116, 1997.
[15]DavidA Kofke.On the acceptance probability of replica-exchange monte carlo trials.The Journal of chemical physics, 117(15):6911–6914, 2002.
[16]MaltheK Bisbo and Bjørk Hammer.Global optimization of atomic structure enhanced by machine learning.Physical Review B, 105(24):245404, 2022.
[17]Lai Wei, Nihang Fu, EdirisuriyaMD Siriwardane, Wenhui Yang, SadmanSadeedOmee, Rongzhi Dong, Rui Xin, and Jianjun Hu.Tcsp: a template-based crystal structure prediction algorithm formaterials discovery.Inorganic Chemistry, 2022.
[18]SeanD Griesemer, Logan Ward, and Chris Wolverton.High-throughput crystal structure solution using prototypes.Physical Review Materials, 5(10):105003, 2021.
[19]Yanchao Wang, Jian Lv, LiZhu, and Yanming Ma.Crystal structure prediction via particle-swarm optimization.Physical Review B, 82(9):094116, 2010.
[20]PavelE Dolgirev, IvanA Kruglov, and ArtemR Oganov.Machine learning scheme for fast extraction of chemicallyinterpretable interatomic potentials.AIP Advances, 6(8):085318, 2016.
[21]EvgenyV Podryabinkin, EvgenyV Tikhonov, AlexanderV Shapeev, and ArtemROganov.Accelerating crystal structure prediction by machine-learninginteratomic potentials with active learning.Physical Review B, 99(6):064114, 2019.
[22]Qunchao Tong, Lantian Xue, Jian Lv, Yanchao Wang, and Yanming Ma.Accelerating calypso structure prediction by data-driven learning ofa potential energy surface.Faraday discussions, 211:31–43, 2018.
[23]Qiuping Yang, Jian Lv, Qunchao Tong, Xin Du, Yanchao Wang, Shoutao Zhang,Guochun Yang, Aitor Bergara, and Yanming Ma.Hard and superconducting cubic boron phase via swarm-intelligencestructural prediction driven by a machine-learning potential.Physical Review B, 103(2):024505, 2021.
[24]Qunchao Tong, Xiaoshan Luo, AdebayoA Adeleke, Pengyue Gao, YuXie, Hanyu Liu,Quan Li, Yanchao Wang, Jian Lv, Yansun Yao, etal.Machine learning metadynamics simulation of reconstructive phasetransition.Physical Review B, 103(5):054107, 2021.
[25]SoTakamoto, Satoshi Izumi, and JuLi.Teanet: Universal neural network interatomic potential inspired byiterative electronic relaxations.Computational Materials Science, 207:111280, 2022.
[26]SoTakamoto, Chikashi Shinagawa, Daisuke Motoki, Kosuke Nakago, Wenwen Li, IoriKurata, Taku Watanabe, Yoshihiro Yayama, Hiroki Iriguchi, Yusuke Asano,etal.Towards universal neural network potential for material discoveryapplicable to arbitrary combination of 45 elements.Nature Communications, 13(1):2991, 2022.
[27]Kamal Choudhary, Brian DeCost, Lily Major, Keith Butler, Jeyan Thiyagalingam,and Francesca Tavazza.Unified graph neural network force-field for the periodic table:solid state applications.Digital Discovery, 2023.
[28]SadmanSadeed Omee, Lai Wei, Ming Hu, and Jianjun Hu.Crystal structure prediction using neural network potential andage-fitness pareto genetic algorithm.Journal of Materials Informatics, 2024.
[29]Minoru Kusaba, Chang Liu, and Ryo Yoshida.Crystal structure prediction with machine learning-based elementsubstitution.Computational Materials Science, 211:111496, 2022.
[30]Farren Curtis, Xiayue Li, Timothy Rose, Alvaro Vazquez-Mayagoitia, SaswataBhattacharya, LucaM Ghiringhelli, and Noa Marom.Gator: a first-principles genetic algorithm for molecular crystalstructure prediction.Journal of chemical theory and computation, 14(4):2246–2264,2018.
[31]ChrisJ Pickard and RJNeeds.High-pressure phases of silane.Physical review letters, 97(4):045504, 2006.
[32]ChrisJ Pickard and RJNeeds.Ab initio random structure searching.Journal of Physics: Condensed Matter, 23(5):053201, 2011.
[33]MaltheK Bisbo and Bjørk Hammer.Global optimization of atomistic structure enhanced by machinelearning.arXiv preprint arXiv:2012.15222, 2020.
[34]Will Tipton and Richard Hennig.Gasp: The genetic algorithm for structure and phase prediction, 2012.
[35]Chi Chen and ShyuePing Ong.A universal graph deep learning interatomic potential for theperiodic table.Nature Computational Science, 2(11):718–728, 2022.
[36]HenrikLund Mortensen, SørenAger Meldgaard, MaltheKjær Bisbo,Mads-PeterV Christiansen, and Bjørk Hammer.Atomistic structure learning algorithm with surrogate energy modelrelaxation.Physical Review B, 102(7):075427, 2020.
[37]Tomoki Yamash*ta, Shinichi Kanehira, Nobuya Sato, Hiori Kino, Kei Terayama,Hikaru Sawahata, Takumi Sato, Futoshi Utsuno, Koji Tsuda, Takashi Miyake,etal.Cryspy: a crystal structure prediction tool accelerated by machinelearning.Science and Technology of Advanced Materials: Methods,1(1):87–97, 2021.
[38]DavidC Lonie and Eva Zurek.Xtalopt: An open-source evolutionary algorithm for crystal structureprediction.Computer Physics Communications, 182(2):372–387, 2011.
[39]Jianjun Hu, Yong Zhao, Yuqi Song, Rongzhi Dong, Wenhui Yang, Yuxin Li, andEdirisuriya Siriwardane.Alphacrystal: Contact map based crystal structure prediction usingdeep learning.arXiv preprint arXiv:2102.01620, 2021.
[40]Yuqi Song, Rongzhi Dong, Lai Wei, Qin Li, and Jianjun Hu.Alphacrystal-ii: Distance matrix based crystal structure predictionusing deep learning.arXiv preprint arXiv:2404.04810, 2024.
[41]Georg Kresse and Jürgen Furthmüller.Efficient iterative schemes for ab initio total-energy calculationsusing a plane-wave basis set.Physical review B, 54(16):11169, 1996.
[42]JohnP Perdew, Kieron Burke, and Matthias Ernzerhof.Generalized gradient approximation made simple.Physical review letters, 77(18):3865, 1996.
[43]PeterE Blöchl.Projector augmented-wave method.Physical review B, 50(24):17953, 1994.
[44]Chi Chen, Weike Ye, Yunxing Zuo, Chen Zheng, and ShyuePing Ong.Graph networks as a universal machine learning framework formolecules and crystals.Chemistry of Materials, 31(9):3564–3572, 2019.
[45]Xiangyu Yin and ChrysanthosE Gounaris.Search methods for inorganic materials crystal structure prediction.Current Opinion in Chemical Engineering, 35:100726, 2022.
[46]KarstenWedel Jacobsen, JKNorskov, and MarttiJ Puska.Interatomic interactions in the effective-medium theory.Physical Review B, 35(14):7423, 1987.
[47]Pierre Hohenberg and Walter Kohn.Inhom*ogeneous electron gas.Physical review, 136(3B):B864, 1964.
[48]LuJeu Sham and Walter Kohn.One-particle properties of an inhom*ogeneous interacting electron gas.Physical Review, 145(2):561, 1966.
[49]MaltheK Bisbo and Bjørk Hammer.Efficient global structure optimization with a machine-learnedsurrogate model.Physical review letters, 124(8):086102, 2020.
[50]Jean-Philippe Vert, Koji Tsuda, and Bernhard Schölkopf.A primer on kernel methods.Kernel methods in computational biology, 47:35–70, 2004.
[51]AlbertP Bartók, Risi Kondor, and Gábor Csányi.On representing chemical environments.Physical Review B, 87(18):184115, 2013.
[52]LasseB Vilhelmsen and Bjørk Hammer.A genetic algorithm for first principles global structureoptimization of supported nano structures.The Journal of chemical physics, 141(4):044711, 2014.
[53]Haitham Seada and Kalyanmoy Deb.U-nsga-iii: a unified evolutionary optimization procedure for single,multiple, and many objectives: proof-of-principle results.In International conference on evolutionary multi-criterionoptimization, pages 34–49. Springer, 2015.
[54]MichaelD Schmidt and Hod Lipson.Age-fitness pareto optimization.In Proceedings of the 12th annual conference on Genetic andevolutionary computation, pages 543–544, 2010.
[55]Brian Kulis etal.Metric learning: A survey.Foundations and Trends® in Machine Learning,5(4):287–364, 2013.
[56]Chang Liu, Erina Fujita, Yukari Katsura, Yuki Inada, Asuka Ishikawa, RyujiTamura, Kaoru Kimura, and Ryo Yoshida.Machine learning to predict quasicrystals from chemical compositions.Advanced Materials, 33(36):2102507, 2021.
[57]Anubhav Jain, ShyuePing Ong, Geoffroy Hautier, Wei Chen, WilliamDavidsonRichards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, GerbrandCeder, etal.Commentary: The materials project: A materials genome approach toaccelerating materials innovation.APL materials, 1(1), 2013.
[58]CameronJ Hargreaves, MatthewS Dyer, MichaelW Gaultois, VitaliyA Kurlin, andMatthewJ Rosseinsky.The earth mover’s distance as a metric for the space of inorganiccompositions.Chemistry of Materials, 32(24):10610–10620, 2020.
[59]Nihang Fu, Jeffrey Hu, Ying Feng, Gregory Morrison, Hans-Conradzur Loye, andJianjun Hu.Composition based oxidation state prediction of materials using deeplearning language models.Advanced Science, 10(28):2301011, 2023.
[60]Wenhui Yang, Edirisuriya MDilanga Siriwardane, Rongzhi Dong, Yuxin Li, andJianjun Hu.Crystal structure prediction of materials with high symmetry usingdifferential evolution.Journal of Physics: Condensed Matter, 33(45):455902, 2021.
[61]Jeremy Rapin and Olivier Teytaud.Nevergrad: a gradient-free optimization platform. github.FacebookResearch/Nevergrad, 2018.
[62]Greg Landrum etal.Rdkit: A software suite for cheminformatics, computational chemistry,and predictive modeling.Greg Landrum, 8, 2013.
[63]Daniil Polykovskiy, Alexander Zhebrak, Benjamin Sanchez-Lengeling, SergeyGolovanov, Oktai Tatanov, Stanislav Belyaev, Rauf Kurbanov, AlekseyArtamonov, Vladimir Aladinskiy, Mark Veselov, etal.Molecular sets (moses): a benchmarking platform for moleculargeneration models.Frontiers in pharmacology, 11:565644, 2020.
[64]Lai Wei, Qin Li, SadmanSadeed Omee, and Jianjun Hu.Towards quantitative evaluation of crystal structure predictionperformance.Computational Materials Science, 235:112802, 2024.
[65]Xiaoshan Luo, Zhenyu Wang, Pengyue Gao, Jian Lv, Yanchao Wang, Changfeng Chen,and Yanming Ma.Deep learning generative model for crystal structure prediction.arXiv preprint arXiv:2403.10846, 2024.
[66]Amil Merchant, Simon Batzner, SamuelS Schoenholz, Muratahan Aykol, GowoonCheon, and EkinDogus Cubuk.Scaling deep learning for materials discovery.Nature, 624(7990):80–85, 2023.
[67]Kuo Bao, Stefan Goedecker, Kenji Koga, Frédéric Lançon, andAlexey Neelov.Structure of large gold clusters obtained by global optimizationusing the minima hopping method.Physical Review B, 79(4):041405, 2009.
[68]Tian Xie, Xiang Fu, Octavian-Eugen Ganea, Regina Barzilay, and TommiSJaakkola.Crystal diffusion variational autoencoder for periodic materialgeneration.In International Conference on Learning Representations, 2021.
[69]Chang Liu, Hiromasa Tamaki, Tomoyasu Yokoyama, Kensuke Wakasugi, SatoshiYotsuhashi, Minoru Kusaba, and Ryo Yoshida.Shotgun crystal structure prediction using machine-learned formationenergies.arXiv preprint arXiv:2305.02158, 2023.
[70]MichaelJ Mehl, David Hicks, Cormac Toher, Ohad Levy, RobertM Hanson, GusHart, and Stefano Curtarolo.The aflow library of crystallographic prototypes: part 1.Computational Materials Science, 136:S1–S828, 2017.