Question: What is the purpose of running Cufflinks without a reference annotation?
1
gravatar for YaGalbi
3.2 years ago by
YaGalbi1.4k
Biocomputing, MRC Harwell Institute, Oxford, UK
YaGalbi1.4k wrote:

My task is to repeat the DATA analysis of RNA-seq data as presented in a journal article using the tophat cufflinks pipeline.

For simplicity Ill just mention the 4 controls

The authors run cufflinks without a reference annotation on each control "to detect possible novel transcripts" --> then cuffmerge on the results --> they then say they run cufflinks again using the merged transctiprts.gtf as the reference annotation. It seems over complicated.

Cufflinks requires a .BAM file as input but cuffmerge output doesnt give a BAM file....so the only way i can see they did it is by re running cufflinks on every sample for a second time (waste of time?) except this time using the cuffmerge output as the reference annotation. This would mean re running cuffmerge again also afterward.

Surely " to detect possible novel transcripts" doesnt require running cufflinks on everything twice....I mean, isnt this the whole point of cufflinks.

Thanks in advance. Kenneth

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by YaGalbi1.4k
3

Hi, I don't really see what is your question here. You answered "What is the purpose of running Cufflinks without a reference annotation?" yourself with that line "to detect possible novel transcripts", so its not so clear to me what you are asking for.

Also, a link to the original article would help commenting on this.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Carlo Yague4.6k
3
gravatar for ablanchetcohen
3.2 years ago by
ablanchetcohen1.2k
Canada
ablanchetcohen1.2k wrote:

The first Cufflinks run is to generate a new annotation for each sample to discover novel transcripts. The Cuffmerge run is to merge together all the annotations for each individual sample to create one merged annotation of better quality. The second Cufflinks run is to quantify the transcripts based on the merged annotation file.

Yes, it is complicated, and the results will contain many false positives. More importantly, it's generally a waste of time, unless you're working on a poorly annotated genome. For well-annotated genomes like the mice, human, or drosophila genomes, you shouldn't bother trying to discover novel transcripts. Just use the most recent annotation available.

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by ablanchetcohen1.2k
2

My task is to repeat the DATA analysis of RNA-seq data as presented in a journal article using the tophat cufflinks pipeline.

@kennethcondon2007 does not have a choice here :-)

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by genomax71k
0
gravatar for YaGalbi
3.2 years ago by
YaGalbi1.4k
Biocomputing, MRC Harwell Institute, Oxford, UK
YaGalbi1.4k wrote:

Thank you all for the replies.

The paper: http://www.nature.com/nbt/journal/v32/n9/full/nbt.3001.html

The pipeline: https://s31.postimg.org/tkcichqkb/pipeline_5.png

Our group has bundled onward so... We completed the first cufflinks run for each sample, then cuff merge and have attempted the second cufflinks run (using the transcripts.gtf file from cuffmerge as reference annotation) with the command:

cufflinks -g [path]/transcripts.gtf -b [path]/genome.fa -u --library-type fr-unstranded [path]/accepted_hits.bam

Is there any dissagreement with the command? Should -g be upper case -G? Should we remove -b option?

The runs started fine (we have 4 computers available to take 2 runs each) however they have all now failed with the following error returned:

Error: duplicate GFF ID 'CUFF.4.1' encountered! https://s32.postimg.org/7p43nd04l/Sup2.jpg

Also one while still running has been stuck at the same point for over an hour: https://s31.postimg.org/yzbl2fu2z/Lee.jpg

Again, thank you in advance. Kenneth.

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by YaGalbi1.4k

You should probably post this as a separate question.

ADD REPLYlink written 3.2 years ago by Jason H20
0
gravatar for YaGalbi
3.2 years ago by
YaGalbi1.4k
Biocomputing, MRC Harwell Institute, Oxford, UK
YaGalbi1.4k wrote:

Adding an annotation file during cuffmerge resolved the issue.

ADD COMMENTlink written 3.2 years ago by YaGalbi1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1399 users visited in the last hour