Note
Go to the end to download the full example code.
Advanced usage of ANIDataset
#
Example showing more involved conformer and property manipulation.
To begin with, let’s import the modules we will use:
import shutil
from pathlib import Path
import torch
import numpy as np
from torchani.datasets import ANIDataset, concatenate
from torchani.datasets.filters import filter_by_high_force
Again for the purposes of this example we will copy and modify two files inside torchani/dataset, which can be downloaded by running the download.sh script.
file1_path = Path.cwd() / "file1.h5"
file2_path = Path.cwd() / "file2.h5"
shutil.copy(Path.cwd() / "../dataset/ani1-up_to_gdb4/ani_gdb_s01.h5", file1_path)
shutil.copy(Path.cwd() / "../dataset/ani1-up_to_gdb4/ani_gdb_s02.h5", file2_path)
ds = ANIDataset(locations=(file1_path, file2_path), names=("file1", "file2"))
Verifying format correctness: 0it [00:00, ?it/s]
Verifying format correctness: 22it [00:00, 2864.43it/s]
/home/ipickering/Repos/ani/torchani/datasets/anidataset.py:351: UserWarning: {'energiesHE', 'smiles', 'coordinatesHE'} found in legacy dataset, this will generate unpredictable issues.
Probably .items() and .values() will work but not much else. It is highly recommended that you backup these properties (if needed) and *delete them* using dataset.delete_properties
warnings.warn(
Verifying format correctness: 0it [00:00, ?it/s]
Verifying format correctness: 92it [00:00, 2737.93it/s]
Property deletion / renaming#
All of the molecules in the dataset have the same properties, energies, coordinates, etc. You can query which are these.
ds.properties
{'energiesHE', 'energies', 'species', 'coordinates', 'smiles', 'coordinatesHE'}
It is possible to delete unwanted / unnedded properties.
ds.delete_properties(("coordinatesHE", "energiesHE", "smiles"))
ds.properties
Deleting properties: 0%| | 0/3 [00:00<?, ?it/s]
Verifying format correctness: 0it [00:00, ?it/s]
Verifying format correctness: 13it [00:00, 4782.98it/s]
Deleting properties: 0%| | 0/13 [00:00<?, ?it/s]
Verifying format correctness: 0it [00:00, ?it/s]
Verifying format correctness: 53it [00:00, 4736.70it/s]
{'energies', 'species', 'coordinates'}
It is also possible to rename the properties by passing a dict of old-new names (the class assumes at least one of “species” or “numbers” is always present, so don’t rename those).
ds.rename_properties({"energies": "molecular_energies", "coordinates": "coord"})
ds.properties
Verifying format correctness: 0it [00:00, ?it/s]
Verifying format correctness: 13it [00:00, 4789.70it/s]
Verifying format correctness: 0it [00:00, ?it/s]
Verifying format correctness: 53it [00:00, 4699.75it/s]
{'species', 'coord', 'molecular_energies'}
Lets rename them back to their original values:
ds.rename_properties({"molecular_energies": "energies", "coord": "coordinates"})
ds.properties
Verifying format correctness: 0it [00:00, ?it/s]
Verifying format correctness: 13it [00:00, 4866.22it/s]
Verifying format correctness: 0it [00:00, ?it/s]
Verifying format correctness: 53it [00:00, 4762.27it/s]
{'energies', 'species', 'coordinates'}
Grouping#
You can query whether your dataset is in a legacy format by interrogating the dataset grouping attribute
ds.grouping
'legacy'
Legacy format is the format used by some old datasets. In the legacy format there can be groups arbitrarily nested in the hierarchical tree inside the h5 files, and the “species”/”numbers” property does not have a batch dimension. This means all properties with an “atomic” dimension must be ordered the same way within a group (don’t worry too much if you don’t understand what this means, it basically means this is difficult to deal with)
We can convert to a less error prone and easier to parse format by calling “regroup_by_formula” or “regroup_by_num_atoms”
ds = ds.regroup_by_formula()
ds.grouping
Regrouping by formulas: 0%| | 0/3 [00:00<?, ?it/s]
Regrouping by formulas: 0%| | 0/13 [00:00<?, ?it/s]
Regrouping by formulas: 15%|███████████████████▊ | 2/13 [00:00<00:00, 13.14it/s]
Regrouping by formulas: 46%|███████████████████████████████████████████████████████████▌ | 6/13 [00:00<00:00, 25.57it/s]
Regrouping by formulas: 85%|████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 11/13 [00:00<00:00, 32.72it/s]
'by_formula'
Another possibility is to group by num atoms
ds = ds.regroup_by_num_atoms()
ds.grouping
Regrouping by number of atoms: 0%| | 0/3 [00:00<?, ?it/s]
Regrouping by number of atoms: 0%| | 0/13 [00:00<?, ?it/s]
'by_num_atoms'
In these formats all of the first dimensions of all properties are the same in all groups, and groups can only have depth one. In other words the tree structure is, for “by_formula”
/C10H22/coordinates, shape (10, 32, 3)
/species, shape (10, 32)
/energies, shape (10,)
/C8H22N2/coordinates, shape (10, 32, 3)
/species, shape (10, 32)
/energies, shape (10,)
/C12H22/coordinates, shape (5, 34, 3)
/species, shape (5, 34)
/energies, shape (5,)
and for, “by_num_atoms”
/032/coordinates, shape (20, 32, 3)
/species, shape (20, 32)
/energies, shape (20,)
/034/coordinates, shape (5, 34, 3)
/species, shape (5, 34)
/energies, shape (5,)
Conformer groups can be iterated over in chunks, up to a specified maximum chunk size. This breaks a conformer group into mini-batches containing multiple inputs, allowing the dataset to be iterated over much more efficiently. As we regrouped the dataset by num_atoms in the previous step, this will iterate over conformer groups containing the same number of atoms.
(tensor([[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
...,
[8, 1, 1],
[8, 1, 1],
[8, 1, 1]]), tensor([[[ -0.000, -0.006, 0.107],
[ 0.000, 0.777, -0.421],
[ 0.000, -0.678, -0.343]],
[[ 0.000, -0.005, 0.133],
[ -0.000, 0.639, -0.614],
[ 0.000, -0.566, -0.557]],
[[ 0.000, 0.001, 0.123],
[ -0.000, 0.718, -0.504],
[ -0.000, -0.727, -0.511]],
...,
[[ -0.000, -0.005, 0.103],
[ 0.000, 0.838, -0.385],
[ 0.000, -0.756, -0.321]],
[[ -0.000, -0.002, 0.109],
[ 0.000, 0.845, -0.406],
[ 0.000, -0.817, -0.384]],
[[ 0.000, -0.004, 0.132],
[ -0.000, 0.795, -0.606],
[ -0.000, -0.729, -0.554]]]))
(tensor([[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1],
[8, 1, 1]]), tensor([[[ 0.000, -0.006, 0.125],
[ -0.000, 0.703, -0.560],
[ 0.000, -0.609, -0.487]],
[[ -0.000, 0.005, 0.105],
[ 0.000, 0.871, -0.335],
[ -0.000, -0.946, -0.394]],
[[ 0.000, 0.007, 0.131],
[ -0.000, 0.687, -0.529],
[ -0.000, -0.794, -0.612]],
...,
[[ 0.000, 0.005, 0.124],
[ -0.000, 0.589, -0.487],
[ -0.000, -0.667, -0.548]],
[[ 0.000, 0.007, 0.131],
[ -0.000, 0.666, -0.528],
[ -0.000, -0.771, -0.610]],
[[ 0.000, 0.001, 0.122],
[ -0.000, 0.743, -0.495],
[ -0.000, -0.764, -0.512]]]))
Property creation#
Sometimes it may be useful to just create one placeholder property for some
purpose. You can make the second dimension equal to the number of atoms in
the group by setting is_atomic=True
, and you can add also extra dims, for
example, this creates a property with shape (N, A)
, for more examples see
docstring of the function.
ds = ds.create_full_property(
"new_property", is_atomic=True, fill_value=0.0, dtype=float
)
ds.properties
{'energies', 'species', 'coordinates', 'new_property'}
We now delete the created property for cleanup
ds.delete_properties("new_property", verbose=False)
ds.properties
{'energies', 'species', 'coordinates'}
Manipulating conformers#
All of the molecules in the dataset have the same properties
Conformers as tensors can be appended by calling append_conformers
.
Here I put random numbers as species and coordinates but you should put
something that makes sense, if you have only one store you can pass
“group_name” directly.
conformers = {
"species": torch.tensor([[1, 1, 6, 6], [1, 1, 6, 6]]),
"coordinates": torch.randn(2, 4, 3),
"energies": torch.randn(2),
}
ds.append_conformers("file1/004", conformers)
<torchani.datasets.anidataset.ANIDataset object at 0x748741a51150>
It is also possible to append conformers as numpy arrays, in this case “species” can hold the chemical symbols or atomic numbers. Internally these will be converted to atomic numbers.
numpy_conformers = {
"species": np.array(
[["H", "H", "C", "N"], ["H", "H", "N", "O"], ["H", "H", "H", "H"]]
),
"coordinates": np.random.standard_normal((3, 4, 3)),
"energies": np.random.standard_normal(3),
}
ds.append_conformers("file1/004", numpy_conformers)
<torchani.datasets.anidataset.ANIDataset object at 0x748741a51150>
Conformers can also be deleted from the dataset. Passing an index will delete a series of conformers, not passing anything deletes the whole group
{'energies': tensor([-56.510, -56.502, -56.507, ..., -0.344, -0.234, -1.103], dtype=torch.float64), 'species': tensor([[7, 1, 1, 1],
[7, 1, 1, 1],
[7, 1, 1, 1],
...,
[1, 1, 6, 7],
[1, 1, 7, 8],
[1, 1, 1, 1]]), 'coordinates': tensor([[[ 0.020, 0.006, -0.078],
[ 0.385, -0.882, 0.067],
[ 0.318, 0.931, 0.038],
[-0.979, -0.132, 0.169]],
[[ 0.003, -0.015, -0.143],
[ 0.533, -0.736, 0.341],
[ 0.229, 0.820, 0.439],
[-0.803, 0.118, 0.400]],
[[-0.007, 0.010, -0.095],
[ 0.566, -0.902, 0.221],
[ 0.528, 0.938, 0.149],
[-0.991, -0.180, 0.147]],
...,
[[-0.249, -1.770, -0.611],
[-0.699, 0.256, 0.080],
[-0.676, 0.035, 1.758],
[-1.738, 1.049, -1.507]],
[[-1.254, -0.362, -0.504],
[ 0.921, -1.425, 0.982],
[-1.476, 0.247, 0.733],
[-0.273, -0.026, 0.927]],
[[-1.082, 1.456, 2.530],
[-0.351, 0.657, 1.429],
[ 0.563, 0.227, -0.162],
[-0.424, 1.179, 1.632]]])}
Lets delete some conformers and try again
ds.delete_conformers("file1/004", [0, 2])
molecules = ds.get_conformers("file1/004")
The len of the dataset has not changed
len(ds)
10
Lets get rid of the whole group
ds.delete_conformers("file1/004")
len(ds)
9
Currently, when appending the class checks:
That the first dimension of all your properties is the same
That you are appending a set of conformers with correct properties
That all your formulas are correct when the grouping type is “by_formula”,
That your group name does not contain illegal “/” characters
That you are only appending one of “species” or “numbers”
It does NOT check:
That the number of atoms is the same in all properties that are atomic
That the name of the group is consistent with the formula / num atoms
It is the responsibility of the user to make sure of those items.
Utilities#
Multiple datasets can be concatenated into one h5 file, optionally deleting the original h5 files if the concatenation is successful.
concat_path = Path.cwd() / "concat.h5"
ds = concatenate(ds, concat_path, delete_originals=True)
Concatenating datasets: 0%| | 0/9 [00:00<?, ?it/s]
Concatenating datasets: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 213.58it/s]
Deleting original stores: 0%| | 0/2 [00:00<?, ?it/s]
Deleting original stores: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 42366.71it/s]
Context manager usage#
If you need to perform a lot of read/write operations in the dataset it can
be useful to keep all the underlying stores open, you can do this by using a
keep_open
context.
{'energies': tensor(-109.491, dtype=torch.float64), 'species': tensor([7, 7]), 'coordinates': tensor([[ 0.000, 0.000, 0.527],
[ 0.000, 0.000, -0.527]])}
{'energies': tensor(-109.492, dtype=torch.float64), 'species': tensor([7, 7]), 'coordinates': tensor([[ 0.000, 0.000, 0.528],
[ 0.000, 0.000, -0.528]])}
{'energies': tensor(-109.494, dtype=torch.float64), 'species': tensor([7, 7]), 'coordinates': tensor([[ 0.000, 0.000, 0.571],
[ 0.000, 0.000, -0.571]])}
{'energies': tensor(-109.492, dtype=torch.float64), 'species': tensor([7, 7]), 'coordinates': tensor([[ 0.000, 0.000, 0.576],
[ 0.000, 0.000, -0.576]])}
{'energies': tensor(-109.493, dtype=torch.float64), 'species': tensor([7, 7]), 'coordinates': tensor([[ 0.000, 0.000, 0.574],
[ 0.000, 0.000, -0.574]])}
{'energies': tensor(-109.489, dtype=torch.float64), 'species': tensor([7, 7]), 'coordinates': tensor([[ 0.000, 0.000, 0.524],
[ 0.000, 0.000, -0.524]])}
{'energies': tensor(-109.491, dtype=torch.float64), 'species': tensor([7, 7]), 'coordinates': tensor([[ 0.000, 0.000, 0.578],
[ 0.000, 0.000, -0.578]])}
{'energies': tensor(-109.497, dtype=torch.float64), 'species': tensor([7, 7]), 'coordinates': tensor([[ 0.000, 0.000, 0.564],
[ 0.000, 0.000, -0.564]])}
{'energies': tensor(-109.497, dtype=torch.float64), 'species': tensor([7, 7]), 'coordinates': tensor([[ 0.000, 0.000, 0.541],
[ 0.000, 0.000, -0.541]])}
{'energies': tensor(-109.488, dtype=torch.float64), 'species': tensor([7, 7]), 'coordinates': tensor([[ 0.000, 0.000, 0.524],
[ 0.000, 0.000, -0.524]])}
Creating a dataset from scratch#
It is possible to create an ANIDataset from scratch by calling: By defalt the grouping is “by_num_atoms”. The first set of conformers you append will determine what properties this dataset will support.
new_path = Path.cwd() / "new_ds.h5"
new_ds = ANIDataset(new_path, grouping="by_formula")
numpy_conformers = {
"species": np.array([["H", "H", "C", "C"], ["H", "C", "H", "C"]]),
"coordinates": np.random.standard_normal((2, 4, 3)),
"forces": np.random.normal(size=(2, 4, 3), scale=0.1),
"dipoles": np.random.standard_normal((2, 3)),
"energies": np.random.standard_normal(2),
}
new_ds.append_conformers("C2H2", numpy_conformers)
print(new_ds.properties)
for c in new_ds.iter_conformers():
print(c)
{'energies', 'species', 'dipoles', 'coordinates', 'forces'}
{'coordinates': tensor([[-0.240, 0.366, 1.269],
[-0.020, 0.736, 0.480],
[ 1.023, -0.101, 0.159],
[ 0.813, -2.349, 0.546]], dtype=torch.float64), 'forces': tensor([[ 0.037, 0.015, 0.110],
[ 0.018, 0.030, 0.025],
[-0.060, 0.027, 0.098],
[ 0.194, 0.153, -0.117]], dtype=torch.float64), 'energies': tensor(0.020, dtype=torch.float64), 'species': tensor([1, 1, 6, 6]), 'dipoles': tensor([ 0.685, 0.322, -1.340], dtype=torch.float64)}
{'coordinates': tensor([[-0.785, 2.048, -0.692],
[-0.190, -1.196, 0.397],
[ 0.304, -1.361, -1.391],
[ 1.604, -1.926, 0.719]], dtype=torch.float64), 'forces': tensor([[ 0.210, -0.244, 0.032],
[ 0.125, 0.006, 0.030],
[ 0.121, -0.104, -0.078],
[ 0.258, 0.015, 0.127]], dtype=torch.float64), 'energies': tensor(0.355, dtype=torch.float64), 'species': tensor([1, 6, 1, 6]), 'dipoles': tensor([-0.356, 0.847, -1.759], dtype=torch.float64)}
Another useful feature is deleting inplace all conformers with force magnitude above a given threshold, we will exemplify this by introducing some conformers with extremely large forces
bad_conformers = {
"species": np.array([["H", "H", "N", "N"], ["H", "H", "N", "N"]]),
"coordinates": np.random.standard_normal((2, 4, 3)),
"forces": np.random.normal(size=(2, 4, 3), scale=100.0),
"dipoles": np.random.standard_normal((2, 3)),
"energies": np.random.standard_normal(2),
}
new_ds.append_conformers("C2H2", bad_conformers)
filtered_conformers_and_ids = filter_by_high_force(new_ds, delete_inplace=True)
filtered_conformers_and_ids
Filtering where any atomic force magnitude > 2.0 Ha / Angstrom: 0it [00:00, ?it/s]
Filtering where any atomic force magnitude > 2.0 Ha / Angstrom: 1it [00:00, 2016.49it/s]
Deleting filtered conformers: 0%| | 0/1 [00:00<?, ?it/s]
Deleting filtered conformers: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 368.12it/s]
Deleted 2 bad conformations
([{'coordinates': tensor([[[ 2.164, -0.270, 0.224],
[-0.737, -1.179, -0.011],
[ 0.772, -0.575, 0.743],
[-1.377, 0.540, 0.136]],
[[ 0.266, -0.662, 1.464],
[ 0.074, 0.960, -1.011],
[-0.203, 1.394, -0.449],
[-0.222, 2.537, -0.832]]], dtype=torch.float64), 'forces': tensor([[[ -5.237, 50.387, 10.226],
[ 252.357, -101.943, -36.191],
[ -24.163, -34.409, 81.134],
[ 65.504, 7.206, 43.761]],
[[ 40.129, -96.685, -15.076],
[ 209.119, -207.160, -41.174],
[ 12.339, 95.208, 26.201],
[ -7.040, 53.899, -83.423]]], dtype=torch.float64), 'energies': tensor([0.645, 2.477], dtype=torch.float64), 'species': tensor([[1, 1, 7, 7],
[1, 1, 7, 7]]), 'dipoles': tensor([[ 0.573, -1.388, -0.481],
[ 0.847, 0.166, -0.318]], dtype=torch.float64)}], {'C2H2': tensor([2, 3])})
Finally, lets delete the files we used for cleanup