Substantive content:
Single-cell sequencing technology allows for getting insight into the processes of cell development and differentiation with a resolution that is unattainable using bulk data. In brief, this method consists of disassembling tissue or organs into cells and encapsulating them separately within fluid droplets. Based on the barcode DNA fragments from each droplet, products of the subsequent sequencing can be assigned to individual cells. The technology is widely used in the studies of immune response, carcinogenesis, and organism development. However, dealing with single-cell data raises new problems that should be addressed during data analysis. These are:
Data sparsity (high frequency of dropouts, that is zero data matrix entries which result from some of the mRNA or DNA molecules not being captured during library preparation)
Large datasets (the data is stored in the matrices whose size is the number of cells x number of studied genes)
High data dimensionality (we try to investigate the differences in the expression of many genes simultaneously)
Multimodality (we want to integrate datasets representing different modalities, for example, gene expression and chromatin accessibility – ATACseq).
During the course, students will be guided in performing the analysis using the output from single-cell technology. The analysis includes:
Cell annotation (identification of cell types)
Identification of highly variable genes
Data cleaning and quality control
Data clustering and visualization
This project will be done using the Bioconductor ecosystem, which is being developed under the R environment. Thus, students will be introduced to the basics of using R (if necessary).