Question: Differential expression analysis with RNA-Seq samples that vary in depth
0
gravatar for Satyajeet Khare
3.4 years ago by
Satyajeet Khare1.5k
Pune, India
Satyajeet Khare1.5k wrote:

Biostars,

I am performing differential gene expression analysis between "control" and "treated" samples that differ 2-3 fold in their depth (control samples are half to one third in number of reads as compared to treated samples). If I perform DE analysis using the old Tuxedo protocol, I do not observed many differentially expressed genes. Not even those that have been used for sample validation before subjecting them for sequencing.

If I load Bigwig files (relatively better normalized) for these samples onto the genome browser, I can see expected difference in reads on the genes of interest. In order to normalize samples for the depth of sequencing, I am trying Samtools view -s to subset the .bam files of samples to similar sizes. But these subset files ain't compatible with Cufflinks since they lack the EOF marker.

I am wondering if such normalization is a good idea and if yes, how to get around this problem of incompatibility with Cufflinks.

Thanks a lot for your help in advance!

ADD COMMENTlink modified 3.4 years ago by Devon Ryan95k • written 3.4 years ago by Satyajeet Khare1.5k
2

I would expect that htseq-count/featurecounts followed by DESeq2/edgeR/limma-voom would be able to deal with this difference in depth, but that's not what you ask for.

The old tuxedo pipeline isn't considered "the best tool in the shed" anymore.

ADD REPLYlink written 3.4 years ago by WouterDeCoster43k

Okay. To make things worse, there is only one sample per group (no replicates). limma-voom cannot calculate Common Dispersion and hence Tag Dispersion for this reason. There might be a way out, but whats the best option of the three?

We can test differentially expressed genes that will come out of the analysis but biological replicates of RNA-Seq are not possible for now.

Thanks for the help!

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by Satyajeet Khare1.5k
2

If you have unreplicated data then all of the presented options are equally crappy. GPower is supposed to be slighty better, but honestly you'd be better off not wasting your time on this dataset.

ADD REPLYlink written 3.4 years ago by Devon Ryan95k

Okay. Thanks a lot for all the help.

ADD REPLYlink written 3.4 years ago by Satyajeet Khare1.5k
1
gravatar for Devon Ryan
3.4 years ago by
Devon Ryan95k
Freiburg, Germany
Devon Ryan95k wrote:

cuffdiff does an appropriate normalization (the same one as DESeq2, if I recall correctly) internally, so please don't subsample. Having said that, as WouterDeCoster wrote, you're strongly encouraged to not use cufflinks/cuffdiff, but rather one of the standard R-based tools.

ADD COMMENTlink written 3.4 years ago by Devon Ryan95k

Thank you! How about the new Tuxedo pipeline? The Ballgown seem to rely on countMatrix. For small sample sizes (n < 4 per group), Balldown recommends regularization using the limma anyway.

Best

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by Satyajeet Khare1.5k

I've never used it, but given who wrote Ballgown it should be much better.

ADD REPLYlink written 3.4 years ago by Devon Ryan95k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1231 users visited in the last hour