QLS Featured Seminar - David Rocke | Quantitative Life Sciences

Main navigation

Event

QLS Featured Seminar - David Rocke

Thursday, September 28, 2017 12:00to13:00

McIntyre Medical Building room 908, 3655 promenade Sir William Osler, Montreal, QC, H3G 1Y6, CA

Add to calendar

Excess False Positives in Negative-Binomial Based Analysis of Data from RNA-Seq Experiments

David M. Rocke^1,2, PhD and Yilun Zhang, MS¹

¹Division of Biostatistics, Department of Public Health Sciences, UC Davis
听²Department of Biomedical Engineering, UC Davis

Key Words: RNA-Seq, Gene Expression, Negative Binomial, DESeq, edgeR, limma-voom

听

RNA-Seq data are increasingly used for whole-genome differential mRNA expression analysis in lieu of gene expression arrays such as those from Affymetrix and Illumina. Because the raw data in RNA-Seq consist of counts of fragments mapping to each gene or exon, and because the counts are over-dispersed, it is common to model the distribution as negative binomial. Yet empirically methods based on the negative binomial generate often massively inflated false positives whether real data are used or simulated negative binomial data. This appears to be a consequence of the fact that the negative binomial with unknown scale is not an exponential family distribution, and that as a quasi-likelihood, the link function, and thus the natural parameter, are functions of the scale parameter. Consequently also, a linear model with negative binomial quasi-likelihood is not a proper generalized linear model unless the scale is known. We demonstrate that, even when the data are truly negative binomial, it is better to use transformation or weighting followed by standard linear models than it is to fit a version of a generalized linear model with estimated scale.

David Rocke

David M. Rocke is Distinguished Professor of Biostatistics and Biomedical Engineering, University of California, Davis. He is Director of Biostatistics in the Clincal and Translational Science Center, and Director of the Center for Biomarker Discovery. His research group works on methods for bioinformatics and data analysis of gene expression arrays, proteomics, metabolomics and other high-throughput biological assays. He is the author or co-author of two scholarly books and over 150 scientific papers.