Skip to content
/ MolCLR Public

Implementation of MolCLR: "Molecular Contrastive Learning of Representations via Graph Neural Networks" in PyG.

License

Notifications You must be signed in to change notification settings

yuyangw/MolCLR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Molecular Contrastive Learning of Representations via Graph Neural Networks

Nature Machine Intelligence [Paper] [arXiv] [PDF]

Yuyang Wang, Jianren Wang, Zhonglin Cao, Amir Barati Farimani
Carnegie Mellon University

This is the official implementation of MolCLR: "Molecular Contrastive Learning of Representations via Graph Neural Networks". In this work, we introduce a contrastive learning framework for molecular representation learning on large unlabelled dataset (~10M unique molecules). MolCLR pre-training greatly boosts the performance of GNN models on various downstream molecular property prediction benchmarks. If you find our work useful in your research, please cite:

@article{wang2022molclr,
  title={Molecular contrastive learning of representations via graph neural networks},
  author={Wang, Yuyang and Wang, Jianren and Cao, Zhonglin and Barati Farimani, Amir},
  journal={Nature Machine Intelligence},
  pages={1--9},
  year={2022},
  publisher={Nature Publishing Group},
  doi={10.1038/s42256-022-00447-x}
}

Getting Started

Installation

Set up conda environment and clone the github repo

# create a new environment
$ conda create --name molclr python=3.7
$ conda activate molclr

# install requirements
$ pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html
$ pip install torch-geometric==1.6.3 torch-sparse==0.6.9 torch-scatter==2.0.6 -f https://pytorch-geometric.com/whl/torch-1.7.0+cu110.html
$ pip install PyYAML
$ conda install -c conda-forge rdkit=2020.09.1.0
$ conda install -c conda-forge tensorboard
$ conda install -c conda-forge nvidia-apex # optional

# clone the source code of MolCLR
$ git clone https://github.com/yuyangw/MolCLR.git
$ cd MolCLR

Dataset

You can download the pre-training data and benchmarks used in the paper here and extract the zip file under ./data folder. The data for pre-training can be found in pubchem-10m-clean.txt. All the databases for fine-tuning are saved in the folder under the benchmark name. You can also find the benchmarks from MoleculeNet.

Pre-training

To train the MolCLR, where the configurations and detailed explaination for each variable can be found in config.yaml

$ python molclr.py

To monitor the training via tensorboard, run tensorboard --logdir ckpt/{PATH} and click the URL http://127.0.0.1:6006/.

Fine-tuning

To fine-tune the MolCLR pre-trained model on downstream molecular benchmarks, where the configurations and detailed explaination for each variable can be found in config_finetune.yaml

$ python finetune.py

Pre-trained models

We also provide pre-trained GCN and GIN models, which can be found in ckpt/pretrained_gin and ckpt/pretrained_gcn respectively.

Acknowledgement

About

Implementation of MolCLR: "Molecular Contrastive Learning of Representations via Graph Neural Networks" in PyG.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages