Zero-Inflated Models for RNA-Seq Count Data

  • Morshed Alam Department of Biostatistics, UNMC College of Public Health, Nebraska Medical Center
  • Naim Al Mahi Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati
  • Munni Begum Ball State University
Keywords: RNA-seq, differential expression, zero inflated Poisson mixed effect model, zero inflated negative binomial mixed effect model, over-dispersed count data


One of the main objectives of many biological studies is to explore differential gene expression profiles between samples. Genes are referred to as differentially expressed (DE) if the read counts change across treatments or conditions systematically. Poisson and negative binomial (NB) regressions are widely used methods for non-over-dispersed (NOD) and over-dispersed (OD) count data respectively. However, in the presence of excessive number of zeros, these methods need adjustments. In this paper, we consider a zero-inflated Poisson mixed effects model (ZIPMM) and zero-inflated negative binomial mixed effects model (ZINBMM) to address excessive zero counts in the NOD and OD RNA-seq data respectively in the presence of random effects. We apply these methods to both simulated and real RNA-seq datasets. The ZIPMM and ZINBMM perform better on both simulated and real datasets.