TorchANI¶
-
class
torchani.
AEVComputer
(Rcr, Rca, EtaR, ShfR, EtaA, Zeta, ShfA, ShfZ, num_species, use_cuda_extension=False)[source]¶ The AEV computer that takes coordinates as input and outputs aevs.
Parameters: - Rcr (float) – \(R_C\) in equation (2) when used at equation (3) in the ANI paper.
- Rca (float) – \(R_C\) in equation (2) when used at equation (4) in the ANI paper.
- EtaR (
torch.Tensor
) – The 1D tensor of \(\eta\) in equation (3) in the ANI paper. - ShfR (
torch.Tensor
) – The 1D tensor of \(R_s\) in equation (3) in the ANI paper. - EtaA (
torch.Tensor
) – The 1D tensor of \(\eta\) in equation (4) in the ANI paper. - Zeta (
torch.Tensor
) – The 1D tensor of \(\zeta\) in equation (4) in the ANI paper. - ShfA (
torch.Tensor
) – The 1D tensor of \(R_s\) in equation (4) in the ANI paper. - ShfZ (
torch.Tensor
) – The 1D tensor of \(\theta_s\) in equation (4) in the ANI paper. - num_species (int) – Number of supported atom types.
- use_cuda_extension (bool) – Whether to use cuda extension for faster calculation (needs cuaev installed).
-
classmethod
cover_linearly
(radial_cutoff: float, angular_cutoff: float, radial_eta: float, angular_eta: float, radial_dist_divisions: int, angular_dist_divisions: int, zeta: float, angle_sections: int, num_species: int, angular_start: float = 0.9, radial_start: float = 0.9)[source]¶ Provides a convenient way to linearly fill cutoffs
This is a user friendly constructor that builds an
torchani.AEVComputer
where the subdivisions along the the distance dimension for the angular and radial sub-AEVs, and the angle sections for the angular sub-AEV, are linearly covered with shifts. By default the distance shifts start at 0.9 Angstroms.To reproduce the ANI-1x AEV’s the signature
(5.2, 3.5, 16.0, 8.0, 16, 4, 32.0, 8, 4)
can be used.
-
forward
(input_: Tuple[Tensor, Tensor], cell: Tensor | None = None, pbc: Tensor | None = None) SpeciesAEV [source]¶ Compute AEVs
Parameters: input (tuple) – Can be one of the following two cases:
If you don’t care about periodic boundary conditions at all, then input can be a tuple of two tensors: species, coordinates. species must have shape
(N, A)
, coordinates must have shape(N, A, 3)
whereN
is the number of molecules in a batch, andA
is the number of atoms.Warning
The species must be indexed in 0, 1, 2, 3, …, not the element index in periodic table. Check
torchani.SpeciesConverter
if you want periodic table indexing.Note
The coordinates, and cell are in Angstrom.
If you want to apply periodic boundary conditions, then the input would be a tuple of two tensors (species, coordinates) and two keyword arguments cell=… , and pbc=… where species and coordinates are the same as described above, cell is a tensor of shape (3, 3) of the three vectors defining unit cell:
tensor([[x1, y1, z1], [x2, y2, z2], [x3, y3, z3]])
and pbc is boolean vector of size 3 storing if pbc is enabled for that direction.
Returns: Species and AEVs. species are the species from the input unchanged, and AEVs is a tensor of shape (N, A, self.aev_length())
Return type: NamedTuple
-
class
torchani.
ANIModel
(modules)[source]¶ ANI model that compute energies from species and AEVs.
Different atom types might have different modules, when computing energies, for each atom, the module for its corresponding atom type will be applied to its AEV, after that, outputs of modules will be reduced along different atoms to obtain molecular energies.
Warning
The species must be indexed in 0, 1, 2, 3, …, not the element index in periodic table. Check
torchani.SpeciesConverter
if you want periodic table indexing.Note
The resulting energies are in Hartree.
Parameters: modules ( collections.abc.Sequence
) – Modules for each atom types. Atom types are distinguished by their order inmodules
, which means, for examplemodules[i]
must be the module for atom typei
. Different atom types can share a module by putting the same reference inmodules
.
-
class
torchani.
SpeciesConverter
(species)[source]¶ Converts tensors with species labeled as atomic numbers into tensors labeled with internal torchani indices according to a custom ordering scheme. It takes a custom species ordering as initialization parameter. If the class is initialized with [‘H’, ‘C’, ‘N’, ‘O’] for example, it will convert a tensor [1, 1, 6, 7, 1, 8] into a tensor [0, 0, 1, 2, 0, 3]
Parameters: - species (sequence of all supported) –
- species –
- order (in order (it is recommended to) –
- number). (according to atomic) –
-
class
torchani.
EnergyShifter
(self_energies, fit_intercept=False)[source]¶ Helper class for adding and subtracting self atomic energies
This is a subclass of
torch.nn.Module
, so it can be used directly in a pipeline as[input->AEVComputer->ANIModel->EnergyShifter->output]
.Parameters: - self_energies (
collections.abc.Sequence
) – Sequence of floating numbers for the self energy of each atom type. The numbers should be in order, i.e.self_energies[i]
should be atom typei
. - fit_intercept (bool) – Whether to calculate the intercept during the LSTSQ fit. The intercept will also be taken into account to shift energies.
-
forward
(species_energies: Tuple[Tensor, Tensor], cell: Tensor | None = None, pbc: Tensor | None = None) SpeciesEnergies [source]¶ (species, molecular energies)->(species, molecular energies + sae)
-
sae
(species)[source]¶ Compute self energies for molecules.
Padding atoms will be automatically excluded.
Parameters: species ( torch.Tensor
) – Long tensor in shape(conformations, atoms)
.Returns: 1D vector in shape (conformations,)
for molecular self energies.Return type: torch.Tensor
- self_energies (
Model Zoo¶
The ANI model zoo that stores public ANI models.
Currently the model zoo has three models: ANI-1x, ANI-1ccx, and ANI-2x.
The parameters of these models are stored in ani-model-zoo repository and
will be automatically downloaded the first time any of these models are
instantiated. The classes of these models are ANI1x
, ANI1ccx
,
and ANI2x
these are subclasses of torch.nn.Module
.
To use the models just instantiate them and either
directly calculate energies or get an ASE calculator. For example:
ani1x = torchani.models.ANI1x()
# compute energy using ANI-1x model ensemble
_, energies = ani1x((species, coordinates))
ani1x.ase() # get ASE Calculator using this ensemble
# convert atom species from string to long tensor
ani1x.species_to_tensor(['C', 'H', 'H', 'H', 'H'])
model0 = ani1x[0] # get the first model in the ensemble
# compute energy using the first model in the ANI-1x model ensemble
_, energies = model0((species, coordinates))
model0.ase() # get ASE Calculator using this model
# convert atom species from string to long tensor
model0.species_to_tensor(['C', 'H', 'H', 'H', 'H'])
-
class
torchani.models.
ANI1x
(periodic_table_index=False, model_index=None)[source]¶ The ANI-1x model as in ani-1x_8x on GitHub and Active Learning Paper.
The ANI-1x model is an ensemble of 8 networks that was trained using active learning on the ANI-1x dataset, the target level of theory is wB97X/6-31G(d). It predicts energies on HCNO elements exclusively, it shouldn’t be used with other atom types.
-
class
torchani.models.
ANI1ccx
(periodic_table_index=False, model_index=None)[source]¶ The ANI-1ccx model as in ani-1ccx_8x on GitHub and Transfer Learning Paper.
The ANI-1ccx model is an ensemble of 8 networks that was trained on the ANI-1ccx dataset, using transfer learning. The target accuracy is CCSD(T)*/CBS (CCSD(T) using the DPLNO-CCSD(T) method). It predicts energies on HCNO elements exclusively, it shouldn’t be used with other atom types.
-
class
torchani.models.
ANI2x
(periodic_table_index=False, model_index=None)[source]¶ The ANI-2x model as in ANI2x Paper and ANI2x Results on GitHub.
The ANI-2x model is an ensemble of 8 networks that was trained on the ANI-2x dataset. The target level of theory is wB97X/6-31G(d). It predicts energies on HCNOFSCl elements exclusively it shouldn’t be used with other atom types.
Datasets¶
Tools for loading, shuffling, and batching ANI datasets
The torchani.data.load(path) creates an iterable of raw data, where species are strings, and coordinates are numpy ndarrays.
You can transform this iterable by using transformations. To do a transformation, call it.transformation_name(). This will return an iterable that may be cached depending on the specific transformation.
Available transformations are listed below:
- species_to_indices accepts two different kinds of arguments. It converts
- species from elements (e. g. “H”, “C”, “Cl”, etc) into internal torchani
indices (as returned by
torchani.utils.ChemicalSymbolsToInts
or thespecies_to_tensor
method of atorchani.models.BuiltinModel
andtorchani.neurochem.Constants
), if its argument is an iterable of species. By default species_to_indices behaves this way, with an argument of('H', 'C', 'N', 'O', 'F', 'S', 'Cl')
However, if its argument is the string “periodic_table”, then elements are converted into atomic numbers (“periodic table indices”) instead. This last option is meant to be used when training networks that already perform a forward pass oftorchani.nn.SpeciesConverter
on their inputs in order to convert elements to internal indices, before processing the coordinates.
- subtract_self_energies subtracts self energies from all molecules of the
- dataset. It accepts two different kinds of arguments: You can pass a dict
of self energies, in which case self energies are directly subtracted
according to the key-value pairs, or a
torchani.utils.EnergyShifter
, in which case the self energies are calculated by linear regression and stored inside the class in the order specified by species_order. By default the function orders by atomic number if no extra argument is provided, but a specific order may be requested.
- remove_outliers removes some outlier energies from the dataset if present.
- shuffle shuffles the provided dataset. Note that if the dataset is
- not cached (i.e. it lives in the disk and not in memory) then this method will cache it before shuffling. This may take time and memory depending on the dataset size. This method may be used before splitting into validation/training shuffle all molecules in the dataset, and ensure a uniform sampling from the initial dataset, and it can also be used during training on a cached dataset of batches to shuffle the batches.
- cache cache the result of previous transformations.
- If the input is already cached this does nothing.
- collate creates batches and pads the atoms of all molecules in each batch
- with dummy atoms, then converts each batch to tensor. collate uses a
default padding dictionary:
{'species': -1, 'coordinates': 0.0, 'forces': 0.0, 'energies': 0.0}
for padding, but a custom padding dictionary can be passed as an optional parameter, which overrides this default padding. Note that this function returns a generator, it doesn’t cache the result in memory.
- pin_memory copies the tensor to pinned (page-locked) memory so that later transfer
- to cuda devices can be done faster.
you can also use split to split the iterable to pieces. use split as:
it.split(ratio1, ratio2, None)
where None in the end indicate that we want to use all of the rest.
Note that orderings used in torchani.utils.ChemicalSymbolsToInts
and
torchani.nn.SpeciesConverter
should be consistent with orderings used
in species_to_indices and subtract_self_energies. To prevent confusion it
is recommended that arguments to intialize converters and arguments to these
functions all order elements by their atomic number (e. g. if you are working
with hydrogen, nitrogen and bromine always use [‘H’, ‘N’, ‘Br’] and never [‘N’,
‘H’, ‘Br’] or other variations). It is possible to specify a different custom
ordering, mainly due to backwards compatibility and to fully custom atom types,
but doing so is NOT recommended, since it is very error prone.
Example:
energy_shifter = torchani.utils.EnergyShifter(None)
training, validation = torchani.data.load(dspath).subtract_self_energies(energy_shifter).species_to_indices().shuffle().split(int(0.8 * size), None)
training = training.collate(batch_size).cache()
validation = validation.collate(batch_size).cache()
If the above approach takes too much memory for you, you can then use dataloader with multiprocessing to achieve comparable performance with less memory usage:
training, validation = torchani.data.load(dspath).subtract_self_energies(energy_shifter).species_to_indices().shuffle().split(0.8, None)
training = torch.utils.data.DataLoader(list(training), batch_size=batch_size, collate_fn=torchani.data.collate_fn, num_workers=64)
validation = torch.utils.data.DataLoader(list(validation), batch_size=batch_size, collate_fn=torchani.data.collate_fn, num_workers=64)
Utilities¶
-
torchani.utils.
pad_atomic_properties
(properties, padding_values={'species': -1})[source]¶ Put a sequence of atomic properties together into single tensor.
Inputs are [{‘species’: …, …}, {‘species’: …, …}, …] and the outputs are {‘species’: padded_tensor, …}
Parameters: - properties (
collections.abc.Sequence
) – sequence of properties. - padding_values (dict) – the value to fill to pad tensors to same size
- properties (
-
torchani.utils.
present_species
(species)[source]¶ Given a vector of species of atoms, compute the unique species present.
Parameters: species ( torch.Tensor
) – 1D vector of shape(atoms,)
Returns: 1D vector storing present atom types sorted. Return type: torch.Tensor
-
torchani.utils.
strip_redundant_padding
(atomic_properties)[source]¶ Strip trailing padding atoms.
Parameters: atomic_properties (dict) – properties to strip Returns: same set of properties with redundant padding atoms stripped. Return type: dict
-
torchani.utils.
map2central
(cell, coordinates, pbc)[source]¶ Map atoms outside the unit cell into the cell using PBC.
Parameters: - cell (
torch.Tensor
) –tensor of shape (3, 3) of the three vectors defining unit cell:
tensor([[x1, y1, z1], [x2, y2, z2], [x3, y3, z3]])
- coordinates (
torch.Tensor
) – Tensor of shape(molecules, atoms, 3)
. - pbc (
torch.Tensor
) – boolean vector of size 3 storing if pbc is enabled for that direction.
Returns: coordinates of atoms mapped back to unit cell.
Return type: - cell (
-
class
torchani.utils.
ChemicalSymbolsToInts
(all_species: Sequence[str])[source]¶ Helper that can be called to convert chemical symbol string to integers On initialization the class should be supplied with a
list
(or in generalcollections.abc.Sequence
) ofstr
. The returned instance is a callable object, which can be called with an arbitrary list of the supported species that is converted into a tensor of dtypetorch.long
. Usage example: .. code-block:: pythonfrom torchani.utils import ChemicalSymbolsToInts # We initialize ChemicalSymbolsToInts with the supported species species_to_tensor = ChemicalSymbolsToInts([‘H’, ‘C’, ‘Fe’, ‘Cl’]) # We have a species list which we want to convert to an index tensor index_tensor = species_to_tensor([‘H’, ‘C’, ‘H’, ‘H’, ‘C’, ‘Cl’, ‘Fe’]) # index_tensor is now [0 1 0 0 1 3 2]Warning
If the input is a string python will iterate over characters, this means that a string such as ‘CHClFe’ will be intepreted as ‘C’ ‘H’ ‘C’ ‘l’ ‘F’ ‘e’. It is recommended that you input either a
list
or anumpy.ndarray
[‘C’, ‘H’, ‘Cl’, ‘Fe’], and not a string. The output of a call does NOT correspond to a tensor of atomic numbers.Parameters: - all_species (
collections.abc.Sequence
ofstr
) – - species (sequence of all supported) –
- order (in order (it is recommended to) –
- number). (according to atomic) –
- all_species (
-
torchani.utils.
hessian
(coordinates: Tensor, energies: Tensor | None = None, forces: Tensor | None = None) Tensor [source]¶ Compute analytical hessian from the energy graph or force graph.
Parameters: - coordinates (
torch.Tensor
) – Tensor of shape (molecules, atoms, 3) - energies (
torch.Tensor
) – Tensor of shape (molecules,), if specified, then forces must be None. This energies must be computed from coordinates in a graph. - forces (
torch.Tensor
) – Tensor of shape (molecules, atoms, 3), if specified, then energies must be None. This forces must be computed from coordinates in a graph.
Returns: Tensor of shape (molecules, 3A, 3A) where A is the number of atoms in each molecule
Return type: - coordinates (
-
torchani.utils.
vibrational_analysis
(masses, hessian, mode_type='MDU', unit='cm^-1')[source]¶ Computing the vibrational wavenumbers from hessian.
Note that normal modes in many popular software packages such as Gaussian and ORCA are output as mass deweighted normalized (MDN). Normal modes in ASE are output as mass deweighted unnormalized (MDU). Some packages such as Psi4 let ychoose different normalizations. Force constants and reduced masses are calculated as in Gaussian.
mode_type should be one of: - MWN (mass weighted normalized) - MDU (mass deweighted unnormalized) - MDN (mass deweighted normalized)
MDU modes are not orthogonal, and not normalized, MDN modes are not orthogonal, and normalized. MWN modes are orthonormal, but they correspond to mass weighted cartesian coordinates (x’ = sqrt(m)x).
-
torchani.utils.
get_atomic_masses
(species)[source]¶ Convert a tensor of atomic numbers (“periodic table indices”) into a tensor of atomic masses
Atomic masses supported are the first 119 elements, and are taken from:
Atomic weights of the elements 2013 (IUPAC Technical Report). Meija, J., Coplen, T., Berglund, M., et al. (2016). Pure and Applied Chemistry, 88(3), pp. 265-291. Retrieved 30 Nov. 2016, from doi:10.1515/pac-2015-0305
They are all consistent with those used in ASE
Parameters: species ( torch.Tensor
) – tensor with atomic numbersReturns: Tensor of dtype torch.double
, with atomic masses, with the same shape as the input.Return type: torch.Tensor
NeuroChem¶
Tools for loading/running NeuroChem input files.
-
class
torchani.neurochem.
Constants
(filename)[source]¶ NeuroChem constants. Objects of this class can be used as arguments to
torchani.AEVComputer
, liketorchani.AEVComputer(**consts)
.-
species_to_tensor
¶ call to convert string chemical symbols to 1d long tensor.
Type: ChemicalSymbolsToInts
-
-
torchani.neurochem.
load_sae
(filename, return_dict=False)[source]¶ Returns an object of
EnergyShifter
with self energies from NeuroChem sae file
-
torchani.neurochem.
load_atomic_network
(filename)[source]¶ Returns an instance of
torch.nn.Sequential
with hyperparameters and parameters loaded NeuroChem’s .nnf, .wparam and .bparam files.
-
torchani.neurochem.
load_model
(species, dir_)[source]¶ Returns an instance of
torchani.ANIModel
loaded from NeuroChem’s network directory.Parameters: - species (
collections.abc.Sequence
) – Sequence of strings for chemical symbols of each supported atom type in correct order. - dir (str) – String for directory storing network configurations.
- species (
-
torchani.neurochem.
load_model_ensemble
(species, prefix, count)[source]¶ Returns an instance of
torchani.Ensemble
loaded from NeuroChem’s network directories beginning with the given prefix.Parameters: - species (
collections.abc.Sequence
) – Sequence of strings for chemical symbols of each supported atom type in correct order. - prefix (str) – Prefix of paths of directory that networks configurations are stored.
- count (int) – Number of models in the ensemble.
- species (
-
class
torchani.neurochem.
Trainer
(filename, device=device(type='cuda'), tqdm=False, tensorboard=None, checkpoint_name='model.pt')[source]¶ Train with NeuroChem training configurations.
Parameters: - filename (str) – Input file name
- device (
torch.device
) – device to train the model - tqdm (bool) – whether to enable tqdm
- tensorboard (str) – Directory to store tensorboard log file, set to
None
to disable tensorboard. - checkpoint_name (str) – Name of the checkpoint file, checkpoints will be stored in the network directory with this file name.
Besides running NeuroChem trainer by programming, we can also run it by
python -m torchani.neurochem.trainer
, use the -h
option for help.
ASE Interface¶
Tools for interfacing with ASE.
-
class
torchani.ase.
Calculator
(species, model, overwrite=False)[source]¶ TorchANI calculator for ASE
Parameters: - species (
collections.abc.Sequence
ofstr
) – sequence of all supported species, in order. - model (
torch.nn.Module
) – neural network potential model that convert coordinates into energies. - overwrite (bool) – After wrapping atoms into central box, whether
to replace the original positions stored in
ase.Atoms
object with the wrapped positions.
- species (
Units¶
Unit conversion factors used in torchani
The torchani networks themselves works internally entirely in Hartrees (energy), Angstroms (distance) and AMU (mass). In some example code and scripts we convert to other more commonly used units. Our conversion factors are consistent with CODATA 2014 recommendations, which is also consistent with the units used in ASE. (However, take into account that ASE uses electronvolt as its base energy unit, so the appropriate conversion factors should always be applied when converting from ASE to torchani) Joule-to-kcal conversion taken from the IUPAC Goldbook. All the conversion factors we use are defined in this module, and convenience functions to convert between different units are provided.
-
torchani.units.
hartree2ev
(x)[source]¶ Hartree to eV conversion factor from 2014 CODATA
1 Hartree = 27.211386024367243 eV
-
torchani.units.
hartree2kcalmol
(x)[source]¶ Hartree to kJ/mol conversion factor from CODATA 2014
1 Hartree = 627.5094738898777 kcal/mol
-
torchani.units.
hartree2kjoulemol
(x)[source]¶ <function hartree2kjoulemol at 0x7f28ac8cad30>
1 Hartree = 2625.4996387552483 kJ/mol
-
torchani.units.
ev2kcalmol
(x)[source]¶ Electronvolt to kcal/mol conversion factor from CODATA 2014
1 eV = 23.060548012069496 kcal/mol
-
torchani.units.
ev2kjoulemol
(x)[source]¶ Electronvolt to kJ/mol conversion factor from CODATA 2014
1 eV = 96.48533288249877 kJ/mol
-
torchani.units.
mhessian2fconst
(x)[source]¶ Converts mass-scaled hessian units into mDyne/Angstrom
Converts from units of mass-scaled hessian (Hartree / (amu * Angstrom^2) into force constant units (mDyne/Angstom), where 1 N = 1 * 10^8 mDyne1 Hartree / (AMU * Angstrom^2) = 23.060548012069496 mDyne/Angstrom
-
torchani.units.
sqrt_mhessian2invcm
(x)[source]¶ Converts sqrt(mass-scaled hessian units) into cm^-1
Converts form units of sqrt(Hartree / (amu * Angstrom^2)) which are sqrt(units of the mass-scaled hessian matrix) into units of inverse centimeters.
Take into account that to convert the actual eigenvalues of the hessian into wavenumbers it is necessary to multiply by an extra factor of 1 / (2 * pi)
1 sqrt(Hartree / (AMU * Angstrom^2)) = 17091.7006789297 cm^-1
-
torchani.units.
sqrt_mhessian2milliev
(x)[source]¶ Converts sqrt(mass-scaled hessian units) into meV
Converts form units of sqrt(Hartree / (amu * Angstrom^2)) which are sqrt(units of the mass-scaled hessian matrix) into units of milli-electronvolts.
Take into account that to convert the actual eigenvalues of the hessian into wavenumbers it is necessary to multiply by an extra factor of 1 / (2 * pi)
1 sqrt(Hartree / (AMU * Angstrom^2)) = 2119.1007908167267 meV