Single-cell transcriptome sequencing (scRNA-Seq), which combines high-throughput single-cell extraction and sequencing capabilities, enables the transcriptome of large numbers of individual cells to be assayed efficiently. Profiling of gene expression at the single-cell level for a large sample of cells is crucial for addressing many biologically relevant questions, such as, the investigation of rare cell types or primary cells (e.g., early development, where each of a small number of cells may have a distinct function) and the examination of subpopulations of cells from a larger heterogeneous population (e.g., classifying cells in brain tissues).

I will discuss some of the statistical analysis issues that have arisen in the context of a collaboration with the UC Berkeley Ngai Lab concerning the analysis of olfactory stem cell fate trajectories for p63 knock-out mice. These issues, ranging from so-called low-level to high-level analyses, include: experimental design, exploratory data analysis (EDA) of scRNA-Seq reads, quality assessment/control (QA/QC), normalization to account for nuisance technical effects, cluster analysis to identify novel cell types, cell lineage and pseudotime inference, and differential expression analysis to identify genes involved in the differentiation process.

Our statistical methods and software are illustrated in the R package bioc2016singlecell, for the workshop "Analysis of Single-Cell RNA-Seq Data with R and Bioconductor", given at the conference "BioC 2016: Where Software and Biology Connect", Stanford, CA, June 25, 2016: