Question: Mrna Gtf File
5.4 years ago
sridhar2bioinfo wrote:

Dear Team,

I am working on mRNA human Tumor gene Expression. I ran tophat followed by cufflinks using hg 19 reference..

Now i wish to annotate the output of cufflinks with the mRNA gtf file.

My doubts are.. 1. should i Map my data with mRNA.fa or hg19 reference while running tophat?? 2. could you suggest me to get the mRNA gtf file??

i tried downloading the mRNA gtf file from ucsc table browser as mentioned below..

Group: Genes and gene Prediction Track Track:Refseq Genes Table:Humanmrna(all_mrna)

the output file size is around 200MB. But i heard known mRNA are only 22000.

Could you Suggest me to get the mRNA.gtf file and the reference to be used during mapping??

Thanks Sri

rnaseq tophat2 cufflinks ucsc
written 5.4 years ago by sridhar2bioinfo
5.4 years ago
Devon Ryan
Freiburg, Germany
Devon Ryan wrote:

Nitpick: You have a "question", not a "doubt". I've seen a lot of people use this phrase, so I assume that this is incorrectly taught in some country (or countries).

  1. Align to the hg19 reference. Since you're using tophat, give it the GTF and it will first align to the transcriptome and do the conversion of the mapping coordinates back to the genome for you.
  2. If you downloaded the genome from UCSC, then get the annotation file from there too. Don't try to mix a reference genome from Ensembl with an annotation from UCSC (or vice versa), as the chromosome names are different. Alternatively, just download the appropriate bundle from iGenomes and you'll have matched bowtie indices and annotation files. That's rather convenient.

Regarding the size of the GTF from UCSC, that 22000 number refers more to the number of genes. Each gene can (and often does) have many different transcripts. That really balloons the size of the file. It may seem large, but that's not unreasonable.

written 5.4 years ago by Devon Ryan
