My task is to repeat the DATA analysis of RNA-seq data as presented in a journal article using the tophat cufflinks pipeline.
For simplicity Ill just mention the 4 controls
The authors run cufflinks without a reference annotation on each control "to detect possible novel transcripts" --> then cuffmerge on the results --> they then say they run cufflinks again using the merged transctiprts.gtf as the reference annotation. It seems over complicated.
Cufflinks requires a .BAM file as input but cuffmerge output doesnt give a BAM file....so the only way i can see they did it is by re running cufflinks on every sample for a second time (waste of time?) except this time using the cuffmerge output as the reference annotation. This would mean re running cuffmerge again also afterward.
Surely " to detect possible novel transcripts" doesnt require running cufflinks on everything twice....I mean, isnt this the whole point of cufflinks.
Thanks in advance. Kenneth