TorchANI¶

class torchani.AEVComputer(Rcr, Rca, EtaR, ShfR, EtaA, Zeta, ShfA, ShfZ, num_species, use_cuda_extension=False)[source]¶

The AEV computer that takes coordinates as input and outputs aevs.

Parameters:

Rcr (float) – \(R_C\) in equation (2) when used at equation (3) in the ANI paper.
Rca (float) – \(R_C\) in equation (2) when used at equation (4) in the ANI paper.
EtaR (torch.Tensor) – The 1D tensor of \(\eta\) in equation (3) in the ANI paper.
ShfR (torch.Tensor) – The 1D tensor of \(R_s\) in equation (3) in the ANI paper.
EtaA (torch.Tensor) – The 1D tensor of \(\eta\) in equation (4) in the ANI paper.
Zeta (torch.Tensor) – The 1D tensor of \(\zeta\) in equation (4) in the ANI paper.
ShfA (torch.Tensor) – The 1D tensor of \(R_s\) in equation (4) in the ANI paper.
ShfZ (torch.Tensor) – The 1D tensor of \(\theta_s\) in equation (4) in the ANI paper.
num_species (int) – Number of supported atom types.
use_cuda_extension (bool) – Whether to use cuda extension for faster calculation (needs cuaev installed).

classmethod cover_linearly(radial_cutoff: float, angular_cutoff: float, radial_eta: float, angular_eta: float, radial_dist_divisions: int, angular_dist_divisions: int, zeta: float, angle_sections: int, num_species: int, angular_start: float = 0.9, radial_start: float = 0.9)[source]¶

Provides a convenient way to linearly fill cutoffs

This is a user friendly constructor that builds an torchani.AEVComputer where the subdivisions along the the distance dimension for the angular and radial sub-AEVs, and the angle sections for the angular sub-AEV, are linearly covered with shifts. By default the distance shifts start at 0.9 Angstroms.

To reproduce the ANI-1x AEV’s the signature (5.2, 3.5, 16.0, 8.0, 16, 4, 32.0, 8, 4) can be used.

forward(input_: Tuple[Tensor, Tensor], cell: Tensor | None = None, pbc: Tensor | None = None) → SpeciesAEV[source]¶

Compute AEVs

Parameters:

input (tuple) –

Can be one of the following two cases:

If you don’t care about periodic boundary conditions at all, then input can be a tuple of two tensors: species, coordinates. species must have shape (N, A), coordinates must have shape (N, A, 3) where N is the number of molecules in a batch, and A is the number of atoms.

Warning

The species must be indexed in 0, 1, 2, 3, …, not the element index in periodic table. Check torchani.SpeciesConverter if you want periodic table indexing.

Note

The coordinates, and cell are in Angstrom.

If you want to apply periodic boundary conditions, then the input would be a tuple of two tensors (species, coordinates) and two keyword arguments cell=… , and pbc=… where species and coordinates are the same as described above, cell is a tensor of shape (3, 3) of the three vectors defining unit cell:

tensor([[x1, y1, z1],
        [x2, y2, z2],
        [x3, y3, z3]])

and pbc is boolean vector of size 3 storing if pbc is enabled for that direction.

Returns: Species and AEVs. species are the species from the input unchanged, and AEVs is a tensor of shape (N, A, self.aev_length())

Return type: NamedTuple

class torchani.ANIModel(modules)[source]¶

ANI model that compute energies from species and AEVs.

Different atom types might have different modules, when computing energies, for each atom, the module for its corresponding atom type will be applied to its AEV, after that, outputs of modules will be reduced along different atoms to obtain molecular energies.

Warning

The species must be indexed in 0, 1, 2, 3, …, not the element index in periodic table. Check torchani.SpeciesConverter if you want periodic table indexing.

Note

The resulting energies are in Hartree.

Parameters:	modules (`collections.abc.Sequence`) – Modules for each atom types. Atom types are distinguished by their order in `modules`, which means, for example `modules[i]` must be the module for atom type `i`. Different atom types can share a module by putting the same reference in `modules`.

class torchani.Ensemble(modules)[source]¶: Compute the average output of an ensemble of modules.

class torchani.SpeciesConverter(species)[source]¶

Converts tensors with species labeled as atomic numbers into tensors labeled with internal torchani indices according to a custom ordering scheme. It takes a custom species ordering as initialization parameter. If the class is initialized with [‘H’, ‘C’, ‘N’, ‘O’] for example, it will convert a tensor [1, 1, 6, 7, 1, 8] into a tensor [0, 0, 1, 2, 0, 3]

Parameters:	species (sequence of all supported) – species – order (in order (it is recommended to) – number). (according to atomic) –

forward(input_: Tuple[Tensor, Tensor], cell: Tensor | None = None, pbc: Tensor | None = None)[source]¶: Convert species from periodic table element index to 0, 1, 2, 3, … indexing

class torchani.EnergyShifter(self_energies, fit_intercept=False)[source]¶

Helper class for adding and subtracting self atomic energies

This is a subclass of torch.nn.Module, so it can be used directly in a pipeline as [input->AEVComputer->ANIModel->EnergyShifter->output].

Parameters:	self_energies (`collections.abc.Sequence`) – Sequence of floating numbers for the self energy of each atom type. The numbers should be in order, i.e. `self_energies[i]` should be atom type `i`. fit_intercept (bool) – Whether to calculate the intercept during the LSTSQ fit. The intercept will also be taken into account to shift energies.

forward(species_energies: Tuple[Tensor, Tensor], cell: Tensor | None = None, pbc: Tensor | None = None) → SpeciesEnergies[source]¶: (species, molecular energies)->(species, molecular energies + sae)

sae(species)[source]¶

Compute self energies for molecules.

Padding atoms will be automatically excluded.

Parameters:	species (`torch.Tensor`) – Long tensor in shape `(conformations, atoms)`.
Returns:	1D vector in shape `(conformations,)` for molecular self energies.
Return type:	`torch.Tensor`

class torchani.nn.Gaussian(*args, **kwargs)[source]¶: Gaussian activation

Model Zoo¶

The ANI model zoo that stores public ANI models.

Currently the model zoo has three models: ANI-1x, ANI-1ccx, and ANI-2x. The parameters of these models are stored in ani-model-zoo repository and will be automatically downloaded the first time any of these models are instantiated. The classes of these models are ANI1x, ANI1ccx, and ANI2x these are subclasses of torch.nn.Module. To use the models just instantiate them and either directly calculate energies or get an ASE calculator. For example:

ani1x = torchani.models.ANI1x()
# compute energy using ANI-1x model ensemble
_, energies = ani1x((species, coordinates))
ani1x.ase()  # get ASE Calculator using this ensemble
# convert atom species from string to long tensor
ani1x.species_to_tensor(['C', 'H', 'H', 'H', 'H'])

model0 = ani1x[0]  # get the first model in the ensemble
# compute energy using the first model in the ANI-1x model ensemble
_, energies = model0((species, coordinates))
model0.ase()  # get ASE Calculator using this model
# convert atom species from string to long tensor
model0.species_to_tensor(['C', 'H', 'H', 'H', 'H'])

class torchani.models.ANI1x(periodic_table_index=False, model_index=None)[source]¶

The ANI-1x model as in ani-1x_8x on GitHub and Active Learning Paper.

The ANI-1x model is an ensemble of 8 networks that was trained using active learning on the ANI-1x dataset, the target level of theory is wB97X/6-31G(d). It predicts energies on HCNO elements exclusively, it shouldn’t be used with other atom types.

class torchani.models.ANI1ccx(periodic_table_index=False, model_index=None)[source]¶

The ANI-1ccx model as in ani-1ccx_8x on GitHub and Transfer Learning Paper.

The ANI-1ccx model is an ensemble of 8 networks that was trained on the ANI-1ccx dataset, using transfer learning. The target accuracy is CCSD(T)*/CBS (CCSD(T) using the DPLNO-CCSD(T) method). It predicts energies on HCNO elements exclusively, it shouldn’t be used with other atom types.

class torchani.models.ANI2x(periodic_table_index=False, model_index=None)[source]¶

The ANI-2x model as in ANI2x Paper and ANI2x Results on GitHub.

The ANI-2x model is an ensemble of 8 networks that was trained on the ANI-2x dataset. The target level of theory is wB97X/6-31G(d). It predicts energies on HCNOFSCl elements exclusively it shouldn’t be used with other atom types.

Datasets¶

Tools for loading, shuffling, and batching ANI datasets

The torchani.data.load(path) creates an iterable of raw data, where species are strings, and coordinates are numpy ndarrays.

You can transform this iterable by using transformations. To do a transformation, call it.transformation_name(). This will return an iterable that may be cached depending on the specific transformation.

Available transformations are listed below:

species_to_indices accepts two different kinds of arguments. It converts
species from elements (e. g. “H”, “C”, “Cl”, etc) into internal torchani indices (as returned by torchani.utils.ChemicalSymbolsToInts or the species_to_tensor method of a torchani.models.BuiltinModel and torchani.neurochem.Constants), if its argument is an iterable of species. By default species_to_indices behaves this way, with an argument of ('H', 'C', 'N', 'O', 'F', 'S', 'Cl') However, if its argument is the string “periodic_table”, then elements are converted into atomic numbers (“periodic table indices”) instead. This last option is meant to be used when training networks that already perform a forward pass of torchani.nn.SpeciesConverter on their inputs in order to convert elements to internal indices, before processing the coordinates.
subtract_self_energies subtracts self energies from all molecules of the
dataset. It accepts two different kinds of arguments: You can pass a dict of self energies, in which case self energies are directly subtracted according to the key-value pairs, or a torchani.utils.EnergyShifter, in which case the self energies are calculated by linear regression and stored inside the class in the order specified by species_order. By default the function orders by atomic number if no extra argument is provided, but a specific order may be requested.
remove_outliers removes some outlier energies from the dataset if present.
shuffle shuffles the provided dataset. Note that if the dataset is
not cached (i.e. it lives in the disk and not in memory) then this method will cache it before shuffling. This may take time and memory depending on the dataset size. This method may be used before splitting into validation/training shuffle all molecules in the dataset, and ensure a uniform sampling from the initial dataset, and it can also be used during training on a cached dataset of batches to shuffle the batches.
cache cache the result of previous transformations.
If the input is already cached this does nothing.
collate creates batches and pads the atoms of all molecules in each batch
with dummy atoms, then converts each batch to tensor. collate uses a default padding dictionary: {'species': -1, 'coordinates': 0.0, 'forces': 0.0, 'energies': 0.0} for padding, but a custom padding dictionary can be passed as an optional parameter, which overrides this default padding. Note that this function returns a generator, it doesn’t cache the result in memory.
pin_memory copies the tensor to pinned (page-locked) memory so that later transfer
to cuda devices can be done faster.

you can also use split to split the iterable to pieces. use split as:

it.split(ratio1, ratio2, None)

where None in the end indicate that we want to use all of the rest.

Note that orderings used in torchani.utils.ChemicalSymbolsToInts and torchani.nn.SpeciesConverter should be consistent with orderings used in species_to_indices and subtract_self_energies. To prevent confusion it is recommended that arguments to intialize converters and arguments to these functions all order elements by their atomic number (e. g. if you are working with hydrogen, nitrogen and bromine always use [‘H’, ‘N’, ‘Br’] and never [‘N’, ‘H’, ‘Br’] or other variations). It is possible to specify a different custom ordering, mainly due to backwards compatibility and to fully custom atom types, but doing so is NOT recommended, since it is very error prone.

Example:

energy_shifter = torchani.utils.EnergyShifter(None)
training, validation = torchani.data.load(dspath).subtract_self_energies(energy_shifter).species_to_indices().shuffle().split(int(0.8 * size), None)
training = training.collate(batch_size).cache()
validation = validation.collate(batch_size).cache()

If the above approach takes too much memory for you, you can then use dataloader with multiprocessing to achieve comparable performance with less memory usage:

training, validation = torchani.data.load(dspath).subtract_self_energies(energy_shifter).species_to_indices().shuffle().split(0.8, None)
training = torch.utils.data.DataLoader(list(training), batch_size=batch_size, collate_fn=torchani.data.collate_fn, num_workers=64)
validation = torch.utils.data.DataLoader(list(validation), batch_size=batch_size, collate_fn=torchani.data.collate_fn, num_workers=64)

Utilities¶

torchani.utils.pad_atomic_properties(properties, padding_values={'species': -1})[source]¶

Put a sequence of atomic properties together into single tensor.

Inputs are [{‘species’: …, …}, {‘species’: …, …}, …] and the outputs are {‘species’: padded_tensor, …}

Parameters:	properties (`collections.abc.Sequence`) – sequence of properties. padding_values (dict) – the value to fill to pad tensors to same size

torchani.utils.present_species(species)[source]¶

Given a vector of species of atoms, compute the unique species present.

Parameters:	species (`torch.Tensor`) – 1D vector of shape `(atoms,)`
Returns:	1D vector storing present atom types sorted.
Return type:	`torch.Tensor`

torchani.utils.strip_redundant_padding(atomic_properties)[source]¶

Strip trailing padding atoms.

Parameters:	atomic_properties (dict) – properties to strip
Returns:	same set of properties with redundant padding atoms stripped.
Return type:	dict

torchani.utils.map2central(cell, coordinates, pbc)[source]¶

Map atoms outside the unit cell into the cell using PBC.

Parameters:	cell (`torch.Tensor`) – tensor of shape (3, 3) of the three vectors defining unit cell: tensor([[x1, y1, z1], [x2, y2, z2], [x3, y3, z3]]) coordinates (`torch.Tensor`) – Tensor of shape `(molecules, atoms, 3)`. pbc (`torch.Tensor`) – boolean vector of size 3 storing if pbc is enabled for that direction.
Returns:	coordinates of atoms mapped back to unit cell.
Return type:	`torch.Tensor`

class torchani.utils.ChemicalSymbolsToInts(all_species: Sequence[str])[source]¶

Helper that can be called to convert chemical symbol string to integers On initialization the class should be supplied with a list (or in general collections.abc.Sequence) of str. The returned instance is a callable object, which can be called with an arbitrary list of the supported species that is converted into a tensor of dtype torch.long. Usage example: .. code-block:: python

from torchani.utils import ChemicalSymbolsToInts # We initialize ChemicalSymbolsToInts with the supported species species_to_tensor = ChemicalSymbolsToInts([‘H’, ‘C’, ‘Fe’, ‘Cl’]) # We have a species list which we want to convert to an index tensor index_tensor = species_to_tensor([‘H’, ‘C’, ‘H’, ‘H’, ‘C’, ‘Cl’, ‘Fe’]) # index_tensor is now [0 1 0 0 1 3 2]

Warning

If the input is a string python will iterate over characters, this means that a string such as ‘CHClFe’ will be intepreted as ‘C’ ‘H’ ‘C’ ‘l’ ‘F’ ‘e’. It is recommended that you input either a list or a numpy.ndarray [‘C’, ‘H’, ‘Cl’, ‘Fe’], and not a string. The output of a call does NOT correspond to a tensor of atomic numbers.

Parameters:	all_species (`collections.abc.Sequence` of `str`) – species (sequence of all supported) – order (in order (it is recommended to) – number). (according to atomic) –

forward(species: List[str]) → Tensor[source]¶: Convert species from sequence of strings to 1D tensor

torchani.utils.hessian(coordinates: Tensor, energies: Tensor | None = None, forces: Tensor | None = None) → Tensor[source]¶

Compute analytical hessian from the energy graph or force graph.

Parameters:	coordinates (`torch.Tensor`) – Tensor of shape (molecules, atoms, 3) energies (`torch.Tensor`) – Tensor of shape (molecules,), if specified, then forces must be None. This energies must be computed from coordinates in a graph. forces (`torch.Tensor`) – Tensor of shape (molecules, atoms, 3), if specified, then energies must be None. This forces must be computed from coordinates in a graph.
Returns:	Tensor of shape (molecules, 3A, 3A) where A is the number of atoms in each molecule
Return type:	`torch.Tensor`

torchani.utils.vibrational_analysis(masses, hessian, mode_type='MDU', unit='cm^-1')[source]¶

Computing the vibrational wavenumbers from hessian.

Note that normal modes in many popular software packages such as Gaussian and ORCA are output as mass deweighted normalized (MDN). Normal modes in ASE are output as mass deweighted unnormalized (MDU). Some packages such as Psi4 let ychoose different normalizations. Force constants and reduced masses are calculated as in Gaussian.

mode_type should be one of: - MWN (mass weighted normalized) - MDU (mass deweighted unnormalized) - MDN (mass deweighted normalized)

MDU modes are not orthogonal, and not normalized, MDN modes are not orthogonal, and normalized. MWN modes are orthonormal, but they correspond to mass weighted cartesian coordinates (x’ = sqrt(m)x).

torchani.utils.get_atomic_masses(species)[source]¶

Convert a tensor of atomic numbers (“periodic table indices”) into a tensor of atomic masses

Atomic masses supported are the first 119 elements, and are taken from:

Atomic weights of the elements 2013 (IUPAC Technical Report). Meija, J., Coplen, T., Berglund, M., et al. (2016). Pure and Applied Chemistry, 88(3), pp. 265-291. Retrieved 30 Nov. 2016, from doi:10.1515/pac-2015-0305

They are all consistent with those used in ASE

Parameters:	species (`torch.Tensor`) – tensor with atomic numbers
Returns:	Tensor of dtype `torch.double`, with atomic masses, with the same shape as the input.
Return type:	`torch.Tensor`

NeuroChem¶

Tools for loading/running NeuroChem input files.

class torchani.neurochem.Constants(filename)[source]¶

NeuroChem constants. Objects of this class can be used as arguments to torchani.AEVComputer, like torchani.AEVComputer(**consts).

species_to_tensor¶

call to convert string chemical symbols to 1d long tensor.

Type:	`ChemicalSymbolsToInts`

torchani.neurochem.load_sae(filename, return_dict=False)[source]¶: Returns an object of EnergyShifter with self energies from NeuroChem sae file

torchani.neurochem.load_atomic_network(filename)[source]¶: Returns an instance of torch.nn.Sequential with hyperparameters and parameters loaded NeuroChem’s .nnf, .wparam and .bparam files.

torchani.neurochem.load_model(species, dir_)[source]¶

Returns an instance of torchani.ANIModel loaded from NeuroChem’s network directory.

Parameters:	species (`collections.abc.Sequence`) – Sequence of strings for chemical symbols of each supported atom type in correct order. dir (str) – String for directory storing network configurations.

torchani.neurochem.load_model_ensemble(species, prefix, count)[source]¶

Returns an instance of torchani.Ensemble loaded from NeuroChem’s network directories beginning with the given prefix.

Parameters:	species (`collections.abc.Sequence`) – Sequence of strings for chemical symbols of each supported atom type in correct order. prefix (str) – Prefix of paths of directory that networks configurations are stored. count (int) – Number of models in the ensemble.

class torchani.neurochem.Trainer(filename, device=device(type='cuda'), tqdm=False, tensorboard=None, checkpoint_name='model.pt')[source]¶

Train with NeuroChem training configurations.

Parameters:	filename (str) – Input file name device (`torch.device`) – device to train the model tqdm (bool) – whether to enable tqdm tensorboard (str) – Directory to store tensorboard log file, set to `None` to disable tensorboard. checkpoint_name (str) – Name of the checkpoint file, checkpoints will be stored in the network directory with this file name.

evaluate(dataset)[source]¶: Run the evaluation

load_data(training_path, validation_path)[source]¶: Load training and validation dataset from file.

run()[source]¶: Run the training

Besides running NeuroChem trainer by programming, we can also run it by python -m torchani.neurochem.trainer, use the -h option for help.

ASE Interface¶

Tools for interfacing with ASE.

class torchani.ase.Calculator(species, model, overwrite=False)[source]¶

TorchANI calculator for ASE

Parameters:	species (`collections.abc.Sequence` of `str`) – sequence of all supported species, in order. model (`torch.nn.Module`) – neural network potential model that convert coordinates into energies. overwrite (bool) – After wrapping atoms into central box, whether to replace the original positions stored in `ase.Atoms` object with the wrapped positions.

Units¶

Unit conversion factors used in torchani

The torchani networks themselves works internally entirely in Hartrees (energy), Angstroms (distance) and AMU (mass). In some example code and scripts we convert to other more commonly used units. Our conversion factors are consistent with CODATA 2014 recommendations, which is also consistent with the units used in ASE. (However, take into account that ASE uses electronvolt as its base energy unit, so the appropriate conversion factors should always be applied when converting from ASE to torchani) Joule-to-kcal conversion taken from the IUPAC Goldbook. All the conversion factors we use are defined in this module, and convenience functions to convert between different units are provided.

torchani.units.hartree2ev(x)[source]¶

Hartree to eV conversion factor from 2014 CODATA

1 Hartree = 27.211386024367243 eV

torchani.units.hartree2kcalmol(x)[source]¶

Hartree to kJ/mol conversion factor from CODATA 2014

1 Hartree = 627.5094738898777 kcal/mol

torchani.units.hartree2kjoulemol(x)[source]¶

1 Hartree = 2625.4996387552483 kJ/mol

torchani.units.ev2kcalmol(x)[source]¶

Electronvolt to kcal/mol conversion factor from CODATA 2014

1 eV = 23.060548012069496 kcal/mol

torchani.units.ev2kjoulemol(x)[source]¶

Electronvolt to kJ/mol conversion factor from CODATA 2014

1 eV = 96.48533288249877 kJ/mol

torchani.units.mhessian2fconst(x)[source]¶

Converts mass-scaled hessian units into mDyne/Angstrom

Converts from units of mass-scaled hessian (Hartree / (amu * Angstrom^2) into force constant units (mDyne/Angstom), where 1 N = 1 * 10^8 mDyne

1 Hartree / (AMU * Angstrom^2) = 23.060548012069496 mDyne/Angstrom

torchani.units.sqrt_mhessian2invcm(x)[source]¶

Converts sqrt(mass-scaled hessian units) into cm^-1

Converts form units of sqrt(Hartree / (amu * Angstrom^2)) which are sqrt(units of the mass-scaled hessian matrix) into units of inverse centimeters.

Take into account that to convert the actual eigenvalues of the hessian into wavenumbers it is necessary to multiply by an extra factor of 1 / (2 * pi)

1 sqrt(Hartree / (AMU * Angstrom^2)) = 17091.7006789297 cm^-1

torchani.units.sqrt_mhessian2milliev(x)[source]¶

Converts sqrt(mass-scaled hessian units) into meV

Converts form units of sqrt(Hartree / (amu * Angstrom^2)) which are sqrt(units of the mass-scaled hessian matrix) into units of milli-electronvolts.

Take into account that to convert the actual eigenvalues of the hessian into wavenumbers it is necessary to multiply by an extra factor of 1 / (2 * pi)

1 sqrt(Hartree / (AMU * Angstrom^2)) = 2119.1007908167267 meV