Question

Cufflinks transcriptome (To build or not to build?)

0

Entering edit mode

5.2 years ago

Morris_Chair ▴ 360

Dear All,

I’m starting using cufflinks after the TopHat2 alignment. Looking at the manual I can see that there are several options to add to this tool, but in particular what is not clear to me is if I have to use the trascriptome or the genome as reference. I run the program using this command line

 cufflinks -p 4 -N -g Homo_sapiens.GRCh37.75.gtf file.bam -o cuffresults

As you can see, I used the reference genome annotation,is tat correct? I’m also wondering if I can use a trascriptome annotation downloaded from ensemble (Homo_sapiens.GRCh38.cdna.all.fa.gz) or I have to build it from the reference genome ?

Thank you

RNA-Seq • 1.2k views

ADD COMMENT • link updated 5.2 years ago by WouterDeCoster 47k • written 5.2 years ago by Morris_Chair ▴ 360

1

Entering edit mode

Please use the "question" post type if you are asking questions, and not "Forum". I have converted your post for you this time (and your previous posts...), but please keep this in mind for further posts.

ADD REPLY • link 5.2 years ago by WouterDeCoster 47k

1

Entering edit mode

You should probably not be using TopHat2 - see this tweet from Lior Pachter one of the creators of TopHat and Cufflinks.

ADD REPLY • link 5.2 years ago by Kristoffer Vitting-Seerup ★ 4.0k

0

Entering edit mode

Hi Kristoffer, thank you for the indicated racomandation, at the moment my aim is to be more familiar with codes and commands used in tools for RNA seq analysi. I'm taking in consideration also HISAT and to be honest currently in my winning list there is featureCounts.

ADD REPLY • link 5.2 years ago by Morris_Chair ▴ 360

0

Entering edit mode

Even if you just want gene-expression I would recommend doing the transcription level quantification - it gives more accurate gene-level estimates - see this blog. For considerations regarding transcription level quantification check this section of my vignette.

Btw even if you decide on usingstrong text featureCounts you still need files mapped to the genome - so the Hisat run is still needed :-)

ADD REPLY • link 5.1 years ago by Kristoffer Vitting-Seerup ★ 4.0k

score 1 · Answer 1 · 2019-03-14

1

Entering edit mode

5.2 years ago

Kevin Blighe 87k

For Cufflinks, you do not have to supply a FASTA reference genome or transcriptome. The GTF file is an annotation file that details the co-ordinates of exons/CDS, UTRs, etc.

Also, Homo_sapiens.GRCh37.75.gtf and Homo_sapiens.GRCh38.cdna.all.fa.gz relate to different genome builds; so, definitely do not use these together.

Perhaps a simple check of the help page will enlighten you - I have pulled out the parameters that may be of interest to you:

  -G/--GTF                     quantitate against reference transcript annotations                      
  -g/--GTF-guide               use reference transcript annotation to guide assembly                   
  -b/--frag-bias-correct       use bias correction - reference fasta required        [ default:   NULL ]

ADD COMMENT • link 5.2 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin, I have a couple of more questions, What would you use as reference transcript annotation? where did you get your reference fasta required for the bias correction? -b

Thank you for help

ADD REPLY • link 5.2 years ago by Morris_Chair ▴ 360

1

Entering edit mode

On the reference genome, I imagine it should be the same as the one to which you originally aligned your reads via TopHat2 / Bowtie2. If you have already performed the TopHat2 step, then you should have a reference genome FASTA file?

The GTF can really be anything. I have used custom GTFs in the past where I had, for example, a suspected novel long non-coding RNA in breast cancer. I edited the GTF and added my own entry for it. Most would take the GTFs from GENCODE: https://www.gencodegenes.org/

ADD REPLY • link 5.2 years ago by Kevin Blighe 87k

0

Entering edit mode

Hello Kevin, I have another question, when my supercomputer will create with cufflinks the transcript.gtf of all my files, I'll have to use cuffmerge for the assembly.

The command linke is

cuffmerge -g gene.gtf -s genome.fa -p 8 assemlies.txt

I don't understand, why do I have to use the reference genome.GTF plus the genome.fa fasta version? It looks to me that I'm using twice the same thing

thank you

ADD REPLY • link 5.2 years ago by Morris_Chair ▴ 360

1

Entering edit mode

Hey, you have the option of choosing whichever GTF you wish. The TopHat2 / Cufflinks pipeline can involve some going back and forward. For example, you may want to first create a transcriptome GTF from all of your samples (using cuffmerge), but guided by the current reference transcript annotation (e.g., from GENCODE). Once you have then defined your custom transcriptome GTF, you can go back and perform count abundance over its transcripts with cufflinks via the -G/--GTF command line parameter.

By the way, the upgraded versions of TopHat2 / Cufflinks are HISAT2 / StringTie.

ADD REPLY • link 5.2 years ago by Kevin Blighe 87k

0

Entering edit mode

thank you Kevin, I think I will stop the run and start to work with HISAT ... I'm sorry cause was almost half of the way after running for two days...

thanks again

ADD REPLY • link 5.2 years ago by Morris_Chair ▴ 360

0

Entering edit mode

Not to worry. Nothing that we do in life is ever time wasted.

ADD REPLY • link 5.2 years ago by Kevin Blighe 87k