{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n\n# Train Your Own Neural Network Potential\n\nThis example shows how to use TorchANI to train a neural network potential\nwith the setup identical to NeuroChem. We will use the same configuration as\nspecified in `inputtrain.ipt`_\n\n    https://github.com/aiqm/torchani/blob/master/torchani/resources/ani-1x_8x/inputtrain.ipt\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>TorchANI provide tools to run NeuroChem training config file `inputtrain.ipt`.\n    See: `neurochem-training`.</p></div>\n\n<div class=\"alert alert-danger\"><h4>Warning</h4><p>The training setup used in this file is configured to reproduce the original research\n    at `Less is more: Sampling chemical space with active learning`_ as much as possible.\n    That research was done on a different platform called NeuroChem which has many default\n    options and technical details different from PyTorch. Some decisions made here\n    (such as, using NeuroChem's initialization instead of PyTorch's default initialization)\n    is not because it gives better result, but solely based on reproducing the original\n    research. This file should not be interpreted as a suggestions to the readers on how\n    they should setup their models.</p></div>\n\n    https://aip.scitation.org/doi/full/10.1063/1.5023802\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "To begin with, let's first import the modules and setup devices we will use:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import torch\nimport torchani\nimport os\nimport math\nimport torch.utils.tensorboard\nimport tqdm\nimport pickle\n\n# helper function to convert energy unit from Hartree to kcal/mol\nfrom torchani.units import hartree2kcalmol\n\n# device to run the training\ndevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now let's setup constants and construct an AEV computer. These numbers could\nbe found in `rHCNO-5.2R_16-3.5A_a4-8.params`\nThe atomic self energies given in `sae_linfit.dat`_ are computed from ANI-1x\ndataset. These constants can be calculated for any given dataset if ``None``\nis provided as an argument to the object of :class:`EnergyShifter` class.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>Besides defining these hyperparameters programmatically,\n  :mod:`torchani.neurochem` provide tools to read them from file.</p></div>\n\n  https://github.com/aiqm/torchani/blob/master/torchani/resources/ani-1x_8x/rHCNO-5.2R_16-3.5A_a4-8.params\n  https://github.com/aiqm/torchani/blob/master/torchani/resources/ani-1x_8x/sae_linfit.dat\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "Rcr = 5.2000e+00\nRca = 3.5000e+00\nEtaR = torch.tensor([1.6000000e+01], device=device)\nShfR = torch.tensor([9.0000000e-01, 1.1687500e+00, 1.4375000e+00, 1.7062500e+00, 1.9750000e+00, 2.2437500e+00, 2.5125000e+00, 2.7812500e+00, 3.0500000e+00, 3.3187500e+00, 3.5875000e+00, 3.8562500e+00, 4.1250000e+00, 4.3937500e+00, 4.6625000e+00, 4.9312500e+00], device=device)\nZeta = torch.tensor([3.2000000e+01], device=device)\nShfZ = torch.tensor([1.9634954e-01, 5.8904862e-01, 9.8174770e-01, 1.3744468e+00, 1.7671459e+00, 2.1598449e+00, 2.5525440e+00, 2.9452431e+00], device=device)\nEtaA = torch.tensor([8.0000000e+00], device=device)\nShfA = torch.tensor([9.0000000e-01, 1.5500000e+00, 2.2000000e+00, 2.8500000e+00], device=device)\nspecies_order = ['H', 'C', 'N', 'O']\nnum_species = len(species_order)\naev_computer = torchani.AEVComputer(Rcr, Rca, EtaR, ShfR, EtaA, Zeta, ShfA, ShfZ, num_species)\nenergy_shifter = torchani.utils.EnergyShifter(None)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now let's setup datasets. These paths assumes the user run this script under\nthe ``examples`` directory of TorchANI's repository. If you download this\nscript, you should manually set the path of these files in your system before\nthis script can run successfully.\n\nAlso note that we need to subtracting energies by the self energies of all\natoms for each molecule. This makes the range of energies in a reasonable\nrange. The second argument defines how to convert species as a list of string\nto tensor, that is, for all supported chemical symbols, which is correspond to\n``0``, which correspond to ``1``, etc.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "try:\n    path = os.path.dirname(os.path.realpath(__file__))\nexcept NameError:\n    path = os.getcwd()\ndspath = os.path.join(path, '../dataset/ani1-up_to_gdb4/ani_gdb_s01.h5')\nbatch_size = 2560\n\npickled_dataset_path = 'dataset.pkl'\n\n# We pickle the dataset after loading to ensure we use the same validation set\n# each time we restart training, otherwise we risk mixing the validation and\n# training sets on each restart.\nif os.path.isfile(pickled_dataset_path):\n    print(f'Unpickling preprocessed dataset found in {pickled_dataset_path}')\n    with open(pickled_dataset_path, 'rb') as f:\n        dataset = pickle.load(f)\n    training = dataset['training'].collate(batch_size).cache()\n    validation = dataset['validation'].collate(batch_size).cache()\n    energy_shifter.self_energies = dataset['self_energies'].to(device)\nelse:\n    print(f'Processing dataset in {dspath}')\n    training, validation = torchani.data.load(dspath)\\\n                                        .subtract_self_energies(energy_shifter, species_order)\\\n                                        .species_to_indices(species_order)\\\n                                        .shuffle()\\\n                                        .split(0.8, None)\n    with open(pickled_dataset_path, 'wb') as f:\n        pickle.dump({'training': training,\n                     'validation': validation,\n                     'self_energies': energy_shifter.self_energies.cpu()}, f)\n    training = training.collate(batch_size).cache()\n    validation = validation.collate(batch_size).cache()\nprint('Self atomic energies: ', energy_shifter.self_energies)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "When iterating the dataset, we will get a dict of name->property mapping\n\n##############################################################################\n Now let's define atomic neural networks.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "aev_dim = aev_computer.aev_length\n\nH_network = torch.nn.Sequential(\n    torch.nn.Linear(aev_dim, 160),\n    torch.nn.CELU(0.1),\n    torch.nn.Linear(160, 128),\n    torch.nn.CELU(0.1),\n    torch.nn.Linear(128, 96),\n    torch.nn.CELU(0.1),\n    torch.nn.Linear(96, 1)\n)\n\nC_network = torch.nn.Sequential(\n    torch.nn.Linear(aev_dim, 144),\n    torch.nn.CELU(0.1),\n    torch.nn.Linear(144, 112),\n    torch.nn.CELU(0.1),\n    torch.nn.Linear(112, 96),\n    torch.nn.CELU(0.1),\n    torch.nn.Linear(96, 1)\n)\n\nN_network = torch.nn.Sequential(\n    torch.nn.Linear(aev_dim, 128),\n    torch.nn.CELU(0.1),\n    torch.nn.Linear(128, 112),\n    torch.nn.CELU(0.1),\n    torch.nn.Linear(112, 96),\n    torch.nn.CELU(0.1),\n    torch.nn.Linear(96, 1)\n)\n\nO_network = torch.nn.Sequential(\n    torch.nn.Linear(aev_dim, 128),\n    torch.nn.CELU(0.1),\n    torch.nn.Linear(128, 112),\n    torch.nn.CELU(0.1),\n    torch.nn.Linear(112, 96),\n    torch.nn.CELU(0.1),\n    torch.nn.Linear(96, 1)\n)\n\nnn = torchani.ANIModel([H_network, C_network, N_network, O_network])\nprint(nn)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Initialize the weights and biases.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>Pytorch default initialization for the weights and biases in linear layers\n  is Kaiming uniform. See: `TORCH.NN.MODULES.LINEAR`_\n  We initialize the weights similarly but from the normal distribution.\n  The biases were initialized to zero.</p></div>\n\n  https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def init_params(m):\n    if isinstance(m, torch.nn.Linear):\n        torch.nn.init.kaiming_normal_(m.weight, a=1.0)\n        torch.nn.init.zeros_(m.bias)\n\n\nnn.apply(init_params)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's now create a pipeline of AEV Computer --> Neural Networks.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "model = torchani.nn.Sequential(aev_computer, nn).to(device)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now let's setup the optimizers. NeuroChem uses Adam with decoupled weight decay\nto updates the weights and Stochastic Gradient Descent (SGD) to update the biases.\nMoreover, we need to specify different weight decay rate for different layes.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>The weight decay in `inputtrain.ipt`_ is named \"l2\", but it is actually not\n  L2 regularization. The confusion between L2 and weight decay is a common\n  mistake in deep learning.  See: `Decoupled Weight Decay Regularization`_\n  Also note that the weight decay only applies to weight in the training\n  of ANI models, not bias.</p></div>\n\n  https://arxiv.org/abs/1711.05101\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "AdamW = torch.optim.AdamW([\n    # H networks\n    {'params': [H_network[0].weight]},\n    {'params': [H_network[2].weight], 'weight_decay': 0.00001},\n    {'params': [H_network[4].weight], 'weight_decay': 0.000001},\n    {'params': [H_network[6].weight]},\n    # C networks\n    {'params': [C_network[0].weight]},\n    {'params': [C_network[2].weight], 'weight_decay': 0.00001},\n    {'params': [C_network[4].weight], 'weight_decay': 0.000001},\n    {'params': [C_network[6].weight]},\n    # N networks\n    {'params': [N_network[0].weight]},\n    {'params': [N_network[2].weight], 'weight_decay': 0.00001},\n    {'params': [N_network[4].weight], 'weight_decay': 0.000001},\n    {'params': [N_network[6].weight]},\n    # O networks\n    {'params': [O_network[0].weight]},\n    {'params': [O_network[2].weight], 'weight_decay': 0.00001},\n    {'params': [O_network[4].weight], 'weight_decay': 0.000001},\n    {'params': [O_network[6].weight]},\n])\n\nSGD = torch.optim.SGD([\n    # H networks\n    {'params': [H_network[0].bias]},\n    {'params': [H_network[2].bias]},\n    {'params': [H_network[4].bias]},\n    {'params': [H_network[6].bias]},\n    # C networks\n    {'params': [C_network[0].bias]},\n    {'params': [C_network[2].bias]},\n    {'params': [C_network[4].bias]},\n    {'params': [C_network[6].bias]},\n    # N networks\n    {'params': [N_network[0].bias]},\n    {'params': [N_network[2].bias]},\n    {'params': [N_network[4].bias]},\n    {'params': [N_network[6].bias]},\n    # O networks\n    {'params': [O_network[0].bias]},\n    {'params': [O_network[2].bias]},\n    {'params': [O_network[4].bias]},\n    {'params': [O_network[6].bias]},\n], lr=1e-3)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Setting up a learning rate scheduler to do learning rate decay\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "AdamW_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(AdamW, factor=0.5, patience=100, threshold=0)\nSGD_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(SGD, factor=0.5, patience=100, threshold=0)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Train the model by minimizing the MSE loss, until validation RMSE no longer\nimproves during a certain number of steps, decay the learning rate and repeat\nthe same process, stop until the learning rate is smaller than a threshold.\n\nWe first read the checkpoint files to restart training. We use `latest.pt`\nto store current training state.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "latest_checkpoint = 'latest.pt'"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Resume training from previously saved checkpoints:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "if os.path.isfile(latest_checkpoint):\n    checkpoint = torch.load(latest_checkpoint)\n    nn.load_state_dict(checkpoint['nn'])\n    AdamW.load_state_dict(checkpoint['AdamW'])\n    SGD.load_state_dict(checkpoint['SGD'])\n    AdamW_scheduler.load_state_dict(checkpoint['AdamW_scheduler'])\n    SGD_scheduler.load_state_dict(checkpoint['SGD_scheduler'])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "During training, we need to validate on validation set and if validation error\nis better than the best, then save the new best model to a checkpoint\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def validate():\n    # run validation\n    mse_sum = torch.nn.MSELoss(reduction='sum')\n    total_mse = 0.0\n    count = 0\n    model.train(False)\n    with torch.no_grad():\n        for properties in validation:\n            species = properties['species'].to(device)\n            coordinates = properties['coordinates'].to(device).float()\n            true_energies = properties['energies'].to(device).float()\n            _, predicted_energies = model((species, coordinates))\n            total_mse += mse_sum(predicted_energies, true_energies).item()\n            count += predicted_energies.shape[0]\n    model.train(True)\n    return hartree2kcalmol(math.sqrt(total_mse / count))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We will also use TensorBoard to visualize our training process\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "tensorboard = torch.utils.tensorboard.SummaryWriter()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally, we come to the training loop.\n\nIn this tutorial, we are setting the maximum epoch to a very small number,\nonly to make this demo terminate fast. For serious training, this should be\nset to a much larger value\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "mse = torch.nn.MSELoss(reduction='none')\n\nprint(\"training starting from epoch\", AdamW_scheduler.last_epoch + 1)\nmax_epochs = 10\nearly_stopping_learning_rate = 1.0E-5\nbest_model_checkpoint = 'best.pt'\n\nfor _ in range(AdamW_scheduler.last_epoch + 1, max_epochs):\n    rmse = validate()\n    print('RMSE:', rmse, 'at epoch', AdamW_scheduler.last_epoch + 1)\n\n    learning_rate = AdamW.param_groups[0]['lr']\n\n    if learning_rate < early_stopping_learning_rate:\n        break\n\n    # checkpoint\n    if AdamW_scheduler.is_better(rmse, AdamW_scheduler.best):\n        torch.save(nn.state_dict(), best_model_checkpoint)\n\n    AdamW_scheduler.step(rmse)\n    SGD_scheduler.step(rmse)\n\n    tensorboard.add_scalar('validation_rmse', rmse, AdamW_scheduler.last_epoch)\n    tensorboard.add_scalar('best_validation_rmse', AdamW_scheduler.best, AdamW_scheduler.last_epoch)\n    tensorboard.add_scalar('learning_rate', learning_rate, AdamW_scheduler.last_epoch)\n\n    for i, properties in tqdm.tqdm(\n        enumerate(training),\n        total=len(training),\n        desc=\"epoch {}\".format(AdamW_scheduler.last_epoch)\n    ):\n        species = properties['species'].to(device)\n        coordinates = properties['coordinates'].to(device).float()\n        true_energies = properties['energies'].to(device).float()\n        num_atoms = (species >= 0).sum(dim=1, dtype=true_energies.dtype)\n        _, predicted_energies = model((species, coordinates))\n\n        loss = (mse(predicted_energies, true_energies) / num_atoms.sqrt()).mean()\n\n        AdamW.zero_grad()\n        SGD.zero_grad()\n        loss.backward()\n        AdamW.step()\n        SGD.step()\n\n        # write current batch loss to TensorBoard\n        tensorboard.add_scalar('batch_loss', loss, AdamW_scheduler.last_epoch * len(training) + i)\n\n    torch.save({\n        'nn': nn.state_dict(),\n        'AdamW': AdamW.state_dict(),\n        'SGD': SGD.state_dict(),\n        'AdamW_scheduler': AdamW_scheduler.state_dict(),\n        'SGD_scheduler': SGD_scheduler.state_dict(),\n    }, latest_checkpoint)"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.16"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}