DeepLife - Program

Program

Note that journal clubs are from 4.30pm until 5.30pm, on Thursdays

The journal club sessions will be 60-minute and online. They will be animated by teachers from all participating universities. Four groups of 2 students from the same university will participate in each session:

1.Background team: the groupe will explain the context and background of the paper (15 minutes).

2.Presenter team: the group that will present the paper (20 minutes).

3.Reviewer team: the group that will act as reviewers and ask questions. Please send your question 2 days before the journal club so they can be shared to other groups (25 minutes).

4.Writer team: the group that will write a 2-page summary of the paper and the session, with up to 10 references.

Prerequisites

Students attending this course are expected to have some basic statistics knowledge and machine-learning fundamentals. You can use the lecture material from last year’s edition, in particular the four introductory lectures:

Date	Title	Speaker	Links
Intro Lecture 1	Intro and Mathematical foundation to DL	Bartek Wilczynski (Warsaw)	Lecture materials, Practical session , Video recording (26.03)
Intro Lecture 2	Convolutional and Recurrent neural networks	Marco Frasca (Milano)	Lecture materials, Practical session ,Video recording (4.03)
Intro Lecture 3	Autoencoders and variational autoencoders	Carl Herrmann (Heidelberg)	Lecture materials, Practical session , Video recording
Intro Lecture 4	Attention mechanisms and transformers	Dario Malchiodi (Milano)	Lecture materials, Practical session , Video recording

Recommended books are among others:

Deep Learning book by Goodfellow, Bengio, Courville
The Elements of Statistical Learning by Hastie, Tibshirani, Friedman
An Introduction to Statistical Learning by Hastie, Tibshirani, Friedman (a simpler version of the previous book)
Machine learning with PyTorch and scikit-learn by Raschka, Liu, Mirjalili (a great introduction into the technical aspects of DL in pyTorch).

As the practical sessions will be mostly based on Python and pyTorch, some basic knowledge in python is required (see reference [4] for a good overview of pyTorch for example).

Specifically, we expect that the following theoretical concepts are familiar:

basic statistics

accuracy
sensitivity/specificity
area under the curve (AUC)
probability distributions
random variable
expectation of a random variable

machine-learning

overfitting vs. underfitting
cross-validation
usage of training, validation and testing datasets
classification vs. regression (supervised vs. unsupervised)
binary vs. multi-class classification
standard ML algorithms such as Random Forest

mathematical foundations

matrix algebra

Title	Speaker	Content	Links
Models for multimodal data integration	Britta Velten (Heidelberg)	This lecture will provide an overview of the basic statistical concepts that are important for the joint analysis of multi-modal omics data with a focus on probabilistic models for data integration. We will discuss statistical properties of multi-omics data and the resulting challenges for the data analysis, followed by an overview on different strategies for both supervised and unsupervised integration. Taking MOFA as example for an unsupervised method we will discuss the properties of probabilistic factor models for joint dimension reduction of multiple omics data sets. We will also discuss avenues to account for multiple sample groups and omics data with temporal and spatial resolution in probabilistic models.	Slides, notebooks and recording
VAE in single-cell genomics	Carl Herrmann (Heidelberg)	In this lecture, I will review recent applications of AE and VAE in the field of genomics, in particular single-cell genomics. We will see how these application can help perform clustering of cell populations and allow to denoise sparse data. Finally, I will present some recent VAE models which are interpretable, i.e. in which the neurons of the model can be interpreted as biological entities. For those not familiar with gemomics, I will start with a brief review of some concepts and data types.	Slides, notebooks and recording
Deep learning for predicting non-coding DNA activity	Bartek Wilczynski (Warsaw)	We know that the transcriptional gene regulation in multicellular organisms depends on the action of hundreds of thousands non-coding regulatory sequences scattered in the genome. Since there are so many of them, and we ususally cannot assess their activity directly in the cells and tissues, annotating their activity experimentally in full is difficult. If we simplify their function into activation of transcription in different cellular contexts, the task of annotation becomes similar to a classical ML problem of multi-class classification. Recently, many studies have been published attempting to solve this problem to different degrees using Convolutional Neural Networks. We will discuss a few recent such papers and discuss their successes as well as some pitfalls of training classifier models without clearly defined negative examples.	Slides, notebooks and recording

AlphaFold, EMSFold to predict structure of proteins	Joanna Sulkowska (Warsaw)		Slides, notebooks and recording
Protein design in the deep learning era, from inverse folding to diffusion models	Elodie Laine (Paris)	The revolution in protein structure prediction has boosted the development of deep learning-based methods for designing de novo protein with desired properties. This session will introduce the modern computational pipeline for protein design, from generating novel protein folds to designing amino acid sequences compatible with them. We will cover a wide range of deep learning architectures and frameworks, including graph neural networks and diffusion models. We will discuss a few recent groundbreaking applications that were unimaginable just a few years ago.	Slides and notebook
Deep Architectures for sampling molecules	Grégoire Sergeant-Perthuis (Paris)	In this lecture, we will explain how to reduce and analyze molecular dynamics trajectories to cluster molecular conformations into discrete states and characterize the resulting conformational transitions..	Slides and notebooks
Deep learning models for protein-ligand binding site prediction	David Hoksza (Prague)	We will introduce methods for predicting protein-small molecule binding sites, with a focus on structure-based approaches. We will briefly cover traditional non-ML and ML methods, followed by deep learning techniques such as convolutional neural networks, graph neural networks, and the latest addition to the field—protein language models.	Slides, notebooks and recording

Deep learning for image segmentation	Karl Rohr (Heidelberg)	The lecture introduces deep learning methods for image segmentation. The focus is on Convolutional Neural Networks (CNNs) and encoder-decoder network architectures for cell segmentation. We will discuss the well-known networks U-Net and Cellpose, and their application for computer-based analysis of cell microscopy image data.	Slides, notebooks and recordings
Intro to BioImage Analysis and Deep Learning Utilization	Martin Schatz (Prague)		Slides, notebooks, recordings