Question: Is there any differences between tophat, cufflinks command with and without GTF file?
0
gravatar for bioinforesearchquestions
3.4 years ago by
United States
bioinforesearchquestions200 wrote:

Dear All,

I have a query regarding the gene annotation file (GTF). 

1) Tophat command without GTF: 

$ tophat -p 8 --library-type fr-firststrand -o tophat_out reference_genome sample1_r1.fq sample1_r2.fq

2) Tophat command with GTF: 

$ tophat -p 8 --library-type fr-firststrand -G genes.gtf -o tophat_out reference_genome sample1_r1.fq sample1_r2.fq

What is the difference between the two tophat commands?

3) Cufflinks command without GTF:

$ cufflinks -p 8 -o cufflinks_out tophat_out/accepted_hits.bam

4) Cufflinks command with GTF:

$ cufflinks -p 8 -G gene.gtf -o cufflinks_out tophat_out/accepted_hits.bam

What is the difference between the two cufflinks commands?

Scenario 1: (Tophat command without GTF and  Cufflinks command with GTF)

$ tophat -p 8 --library-type fr-firststrand -o tophat_out reference_genome sample1_r1.fq sample1_r2.fq
$ cufflinks -p 8 -G gene.gtf -o cufflinks_out tophat_out/accepted_hits.bam 

Scenario 2: (Tophat command with GTF and Cufflinks command without GTF)

$ tophat -p 8 --library-type fr-firststrand -G genes.gtf -o tophat_out reference_genome sample1_r1.fq sample1_r2.fq
$ cufflinks -p 8 -o cufflinks_out tophat_out/accepted_hits.bam

Scenario 3: (Tophat command with GTF and Cufflinks command with GTF)

$ tophat -p 8 --library-type fr-firststrand -G genes.gtf -o tophat_out reference_genome sample1_r1.fq sample1_r2.fq
$ cufflinks -p 8 -G genes.gtf -o cufflinks_out tophat_out/accepted_hits.bam

Scenario 4: (Tophat command without GTF and Cufflinks command without GTF)

$ tophat -p 8 --library-type fr-firststrand -o tophat_out reference_genome sample1_r1.fq sample1_r2.fq
$ cufflinks -p 8 -o cufflinks_out tophat_out/accepted_hits.bam

What is the difference between scenario1, scenario2, scenario3 and scenario4?

Does the output of scenario1, scenario2, scenario3 and scenario4 are same or different?

rna-seq cufflinks tophat • 1.8k views
ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by bioinforesearchquestions200
1

Have you read the manual?

ADD REPLYlink written 3.4 years ago by Devon Ryan89k

Hi Devon,

I read the manual, but still I was not clear. 

ADD REPLYlink written 3.4 years ago by bioinforesearchquestions200

Did Chirag's reply clarify things?

ADD REPLYlink written 3.4 years ago by Devon Ryan89k

Hi Devon, 

I have a better understanding now. 

The reason why I have 4 different scenarios is, I have seen from different posts that people use these different combinations.  

I am currently running all these 4 different combinations in my system. As of now I dint see my results. 

So, I would like to know what should I expect from the output files of above 4 scenarios. 

ADD REPLYlink written 3.4 years ago by bioinforesearchquestions200

In general, if your organism has a decent annotation then you'll get better results if you use it.

ADD REPLYlink written 3.4 years ago by Devon Ryan89k
2
gravatar for Chirag Parsania
3.4 years ago by
Chirag Parsania1.4k
University of Macau
Chirag Parsania1.4k wrote:

Hi,

Find your few of the answers below 

Que. 1  What is the difference between the two tophat commands? 

Ans. When you run tophat with gtf file first it will build transcriptome by reading the information from gtf file. Then it will do alignment with transcriptome and not whole genome. Once it finishes alignment with transcriptoe remaining reads it will align with genome. That's how your alignment will be faster and it's a kind of guided alignment  

Que.2 What is the difference between the two tophat commands?  

Ans. Again answer is the same as I mentioned above. It will guide cufflink to build assembly. In your final output you will have both things known as well novel transcripts built from your data.

Please refer this http://cole-trapnell-lab.github.io/cufflinks/cufflinks/index.html

Hope other two you can solve by yourself

 Cheers,

Chirag

ADD COMMENTlink written 3.4 years ago by Chirag Parsania1.4k

Thanks Chirag for your explanation. 

1) Tophat command without GTF: Align the reads directly to reference genome. Generated accepted_hits.bam file will consider all mapping as novel exon-exon junctions.

2) Tophat command with GTF: Based on GTF file a junction database is created. Then TopHat will align reads that do not map within an exon against the junction database to identify spliced read alignments. If the alignment is still not found in junction DB it will consider as novel exon-exon junction. Generated accepted_hits.bam file will have two mappings one is spliced based on GTF and novel exon-exon junction.

I am clear with tophat now. But I have a doubt in cufflinks -G GTF and -g GTF?

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by bioinforesearchquestions200

I think cufflink only has the -g option. Basically, what cufflinks try to perform was to try to build a transcript GTF file based on your data. Without the -g option, cufflinks will assemble the transcript based only on your reads. With the GTF file, it will perform a guided assembly, kind of like performing denovo assembly with a reference genome.

ADD REPLYlink written 3.4 years ago by Sam2.3k

Hi Sam,

Thanks for your explanation. I am getting it. 

Does the output from cufflinks with GTF and without GTF differ?

I have a GTF file for mouse. Then which of the above scenarios should be used for my analysis?

ADD REPLYlink written 3.4 years ago by bioinforesearchquestions200

I can ensure you that you get a completely different output. Probed

ADD REPLYlink written 3.4 years ago by Antonio R. Franco4.0k

Yes, most likely a different output will be generated. If you are working on mouse, use the mouse GTF so that you can perform the guided assembly. 

ADD REPLYlink written 3.4 years ago by Sam2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 933 users visited in the last hour