Zero-Inflated Models for RNA-Seq Count Data
One of the main objectives of many biological studies is to explore differential gene expression profiles between samples. Genes are referred to as differentially expressed (DE) if the read counts change across treatments or conditions systematically. Poisson and negative binomial (NB) regressions are widely used methods for non-over-dispersed (NOD) and over-dispersed (OD) count data respectively. However, in the presence of excessive number of zeros, these methods need adjustments. In this paper, we consider a zero-inflated Poisson mixed effects model (ZIPMM) and zero-inflated negative binomial mixed effects model (ZINBMM) to address excessive zero counts in the NOD and OD RNA-seq data respectively in the presence of random effects. We apply these methods to both simulated and real RNA-seq datasets. The ZIPMM and ZINBMM perform better on both simulated and real datasets.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Submission of any work for publication in this journal would imply that the authors acknowledge that the work is their own and that they have taken all necessary permissions for all the materials used in their work.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-NonCommercial License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors permit us for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.