Question: Filtering genes from cuffdiff results
gravatar for sujaypatil
6 days ago by
University of Southern California
sujaypatil0 wrote:

I have run cuffdiff (with statistics turned ON) to compare two groups of samples: Control group and Late AD group.

This is the command I ran to be precise:

cuffdiff -L Control,AD_Late_Braak -p 8 --total-hits-norm --frag-bias-correct ../References/ensembl.GRCh38.99.fa --multi-read-correct --library-norm-method quartile ../References/Homo_sapiens.GRCh38.99.chr.gtf Early_Braak_Control_1,Early_Braak_Control_2,Early_Braak_Control_3 Late_Braak_Sample_1,Late_Braak_Sample_2,Late_Braak_Sample_3

I'm looking at the output from cuffdiff and I see a gene_exp.diff file which contains the results of the differential expression testing. I want to know what is the best way to filter the results of this gene_exp.diff file so as to restrict the number of genes that are up-regulated and down-regulated to a list between ~50-850.

P.S My thoughts are to tweak the p-values and log2 fold change values, but seems like a trail-and-error method, so I was wondering if there was a more "formal" method/approach?


rna-seq cuffdiff tophat2 • 107 views
ADD COMMENTlink written 6 days ago by sujaypatil0

Hello sujaypatil!

It appears that your post has been cross-posted to another site:

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 6 days ago by RamRS26k
gravatar for ATpoint
6 days ago by
ATpoint31k wrote:

First of all I would abandon tophat and cufflinks since both methods are now considered deprecated. I would switch to a quantification tool such as salmon or kallisto followed by differential analysis with something like DESeq2 and edgeR. Both the latter tools have options to test against a certain fold change (instead of the default test against 0) which allow to reduce the number of DGEs to those that are probably the most biologically-meaningful. This is recommended by the edgeR authors if you feel that you have "too many genes" and want to filter in a data-driven fashion without tweaking the p-values too much. Effectively this means that only genes with higher FCs will be retained. In edgeR the function is called glmTreat, for the DESeq2 analogon please check the documentation. Still I think there is no formal way of obtaining exactly <int> DGEs since this is not how DGE analysis works. It only tells you how many genes at the given depth, number of replicates are significantly different from the expectation which again is based on the underlying model.

ADD COMMENTlink modified 6 days ago • written 6 days ago by ATpoint31k

Thanks a tonne for the recommendation! I will keep in mind, the salmon / kallisto + DESeq2 / edgeR package stack in mind for future analyses. However, as part of a college assignment we've had to use the Tuxedo suite of tools for DE analysis, and so I've gone ahead and run the analysis using cuffdiff for now. I understand that there is a package called CummeRbund which also helps filter out the most significant DGEs. Great! I understand, thanks for the help! Per your recommendation I will experiment with edgeR.

ADD REPLYlink modified 6 days ago • written 6 days ago by sujaypatil0

I've moved ATPoint's comment to an answer. If it was helpful, you should upvote it; if it resolved your question, you should mark it as accepted. You can accept more than one answer if they all work.


ADD REPLYlink written 6 days ago by RamRS26k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1056 users visited in the last hour