Question: Difference between Gene Models (Annotation files .gtf) for human (hg19)
0
gravatar for M K
4.6 years ago by
M K460
United States
M K460 wrote:

Hi All,

I am working in RNA seq analysis and I am going to use the gene model (annotation file .gtf) for human (hg19). I found different releases for that like GRCh37.55, GRCh37.61, ...........GRCh37.75,.........

When I looked inside each release, I found that the order of chromosome on them not the same (i.e some start with chr1 then chr 10, chr11, chr12,  ......, and chrY. and other one start with GL000213.1, then HSCHR21_2_CTG1_1, then chr18,..chr 10)

My question is there any difference of use them with the same order on them, and Why/what these differences between them (i.e why they change the chromosome order in each release)

 

 

rna-seq genome • 2.6k views
ADD COMMENTlink written 4.6 years ago by M K460

Which annotation tool are you using? You shouldn't worry about the order unless the annotation tool you are using is picky about it. Tools like GATK, Tuxedo suit (RNAseq tools) are picky about the order of chromosomes in bam and gtf file (for good reasons). I have no idea why they change the chromosome order in different releases. 

ADD REPLYlink written 4.6 years ago by Ashutosh Pandey11k

Hi Pandey, Thanks for your comment.

I am not using any tool to create the annotation files, I just downloaded different releases from ensembl website and I am going to use one of them for the parameter -G  in Tophat. 

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by M K460

I am sure you may have come across this on TopHat manual page. Just follow what they say and you will be fine:

-G/--GTF <GTF/GFF3 file>

Supply TopHat with a set of gene model annotations and/or known transcripts, as a GTF 2.2 or GFF3 formatted file. If this option is provided, TopHat will first extract the transcript sequences and use Bowtie to align reads to this virtual transcriptome first. Only the reads that do not fully map to the transcriptome will then be mapped on the genome. The reads that did map on the transcriptome will be converted to genomic mappings (spliced as needed) and merged with the novel mappings and junctions in the final tophat output.

Please note that the values in the first column of the provided GTF/GFF file (column which indicates the chromosome or contig on which the feature is located), must match the name of the reference sequence in the Bowtie index you are using with TopHat. You can get a list of the sequence names in a Bowtie index by typing:

 

bowtie-inspect --names your_index


So before using a known annotation file with this option please make sure that the 1st column in the annotation file uses the exact same chromosome/contig names (case sensitive) as shown by the bowtie-inspect command above.
ADD REPLYlink written 4.6 years ago by Ashutosh Pandey11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 732 users visited in the last hour