Skip to main content

This course explores advanced topics in data science and artificial intelligence (AI) with a focus on chemistry applications. It covers the fundamentals of machine learning and data science, along with advanced research areas such as ML-guided experimental design, high-throughput screening, and AI-driven chemical discovery. Through a combination of lectures and hands-on lab sessions, students will gain both the theoretical background and practical skills needed to pursue research in chemical data science.

Evaluation:

Class participation (20%): Includes attending lectures and hands-on sessions and actively asking questions.

Hands-on assignments (50%): Hands-on assignments include coding and computational tasks, along with analysis of the results.

Capstone project (30%): A short research-related project that will be summarized as a project report and a 20-minute talk.

Schedule and Course Materials

Lecture notes and lab codes for the first 17 lectures were created by Dr. Chong Sun. The English was refined with assistance from ChatGPT. More lectures will be added, stay tuned!

All rights reserved. If you use any part of this content, please provide proper acknowledgment of the source. If you enjoy the course, please give a star to the Course Github Repo!

#

Date

Lecture                                                                                              

Lab

1. 

Jan 20     

Introduction: Data science for Chemistry

1. Python Basics

2. UV-Vis Spectrum

2

Jan 22

Chemical data: Representation

1. 1D string and 3D XYZ

2. Graph representation.

3

Jan 27

Chemical data: Acquisition and Feature Engineering

1. API Access

2. Chemical Similarity

4

Jan 29

Fundamentals of Statistical Analysis

Maximum Likelihood Estimation

5

Feb 3

Statistical Methods for Chemical Prediction

1. Solubility prediction with linear

models

2. Boiling point prediction with

Gaussian process

6

Feb 5

Dimension reduction and Preliminaries on

Training ML Models

Feature selection and PCA

7

Feb 10

Feedforward Neural Networks

1. Build an FNN with PyTorch

2. Use GPU backend

8

Feb 12

Chemical Feature Engineering II

1. Ultrafast shape recognition

2. Simple learned embedding for

molecules

9

Feb 17

Case Study: Learning a potential energy surface (PES)

1. MLP with ACSF and FNN

2. MLP with SOAP and GPR

10

Feb 19

Chemical Prediction with Graph neural networks (GNN)

1. Molecular property prediction with

GNN

2. Create a Graph Dataset

11

Feb 24

Active Learning for Chemical Data Efficiency

1. Improve GPR with active

learning

2. Improve FNN with MC dropout

and max-min sampling

12

Feb 26

Chemical Design with Bayesian optimization

BH Reaction optimized by BO

13

Mar 3

Reaction kinetics modeling with recurrent neural networks (RNN)

1. Reaction Kinetics with RNN

2. Molecular Generation with RNN

(LSTM)

14

Mar 5

Retrosynthesis with Sequence-to-Sequence Modeling

Retrosynthesis with Seq2Seq

model

15

Mar 10

Generative Chemistry and Variational Autoencoder

Learning the Chemical Latent

Space with VAE

16

Mar 12

Molecular Docking with Diffusion Models

1. Processing a PDB file

2. A simple implementation of Diffusion Models

17

Mar 24

Crystal Structure Design with Transformer

18

Mar 26

Equivariance Neural Networks

Laurence Giordano

19

Mar 31

Self-Driving Lab and Agentic AI

Self-reading

21

Apr 7

Help session with terminal, WSL, VSCode,

Git/Github

22

Apr 9

Guest lecture (Prof. York)

23

Apr 16

Guest lecture (Prof. Khare)

24 

Apr 23

Guest lecture (Prof. Remsing)