Question

Pairwise Cuffdiff Usage

1

Entering edit mode

8.2 years ago

cfarmeri ▴ 210

Hi Biostars

I would like to practice differential expression analysis with Tophat/Cufflinks.

I have 6 conditions and each have no replicates.(I understand I should prepare some replicates.)

I executed Tophat mapping and Cufflinks assembly to each condition.

Then, I don't know which of following ways is correct.

I should cuffmerge/cuffdiff all 6 condition all at once
I should cuff merge/cuffdiff each pairwise of 6 samples(namely, 15 times pairwise analysis)

The normalisation in Cuffdiff is for each pairwise? or all conditions?

Anyone has efficient idea fitting my needs.

My English is not good but hope it doesn't cause any trouble.

Thanks

RNA-Seq • 2.5k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.2 years ago by cfarmeri ▴ 210

0

Entering edit mode

Your English is fine, don't worry. However the design of your experiment is poor. I wouldn't spend time on data analysis with no replicates. Make the replicates first and then start the data analysis, analyzing your data in its current form is a waste of time!

ADD REPLY • link 8.2 years ago by Benn 8.3k

0

Entering edit mode

b.nota

Thank you so much! As your advice, I prepare replicate.

ADD REPLY • link 8.2 years ago by cfarmeri ▴ 210

Ram · Answer 1 · 2016-02-08

As b.nota pointed out in their comment, you really do want replicates of your conditions to properly analyze the data. But you can start the analysis, although you'll need to redo one of the lengthier two steps near the end to add your replicates in afterwards. But it is worth answering your question as to the proper procedure. Without getting into more complex issues, if you want to do the 'standard' Cufflinks/cuffdiff analysis you want to follow the left-hand diagram found here: http://cole-trapnell-lab.github.io/cufflinks/getting_started/#common-uses-of-the-cufflinks-package

Now, you have multiple replicates so don't yet think too much about Condition A versus Condition B. You run Tophat and Cufflinks on all your samples individually to start, which you have done. You run Cuffmerge on all of your samples together. This produces a sort of global transcriptome assembly. This is important because you may have transcripts represented in one sample/condition and not others, and for proper comparisons later you need this to be reflected in your final transcript assembly. Now you run cuffquant on each sample on its own, this takes some time, but it really saves you time down the road versus the old method. While you do Cuffquant on each sample on its own you use the merged transcript assembly that you created with cuffmerge. Then, once you have all of your cuffquant outputs (cxb files) you can do your pairwise cuffdiff analyses using those cxb files to do whatever comparisons you wish. The cuffdiff steps are relatively fast because all of the quantification was done earlier. The thing is once you have done your replicate sequencing you need to do cuffmerge again as you have new samples to be included. This also means you need to do cuffquant on all samples again (not just the new ones) because that reference transcript assembly will have changes. And since you then have new cxb files, you need to do the cuffdiff step over again for all pairwise comparisons. So there is a pretty significant amount of compute time you need to do all over again once you get replicate samples.