Compatibility between my reference genome and GTF file
0
2
Entering edit mode
6.0 years ago
Batu ▴ 270

Hello,

I'm new to bioinformatics, I have a question which may seem easy. For my RNA-seq analyses, I was using HISAT2 with already-indexed reference genome downloaded from HISAT2 website for Homo sapiens. And as GTF I was using the one I downloaded from Ensembl. After some runs, I thought this usage might be wrong, because I downloaded reference genome and GTF from different sources (HISAT2 website does not have GTF) and they might not be matched. So I've decided to download a new reference genome from Ensembl and index it myself, and using the corresponding GTF file from Ensembl with the right version number (v93). Is there any incompatibility between my previous GTF Homo_sapiens.GRCh38.92.gtf and reference genome? How should I use them?

rna-seq hisat2 alignment reference genome • 4.2k views
ADD COMMENT
1
Entering edit mode

Always use the same version of the genome and the annotation.

ADD REPLY
1
Entering edit mode

simple suggestion go for gencode you can find the genome assembly and gtf file at the same place of various corresponding versions

ADD REPLY
1
Entering edit mode

Usually for any incompatibility, HISAT2 report warning or errors. But as juke-34 pointed out correctly, it is always advisable to use same version of genome and annotation file to avoid any further error.

ADD REPLY
1
Entering edit mode

The "tran" versions of the indexed genomes on the HISAT2 website contains the GTF annotation files. Its probably easier to use these. Link to download GRCh38 index + GTF.

ADD REPLY
0
Entering edit mode

It is really easy to use them, but isn't it outdated? (It was released more than 2 years ago according to its date on ftp)

ADD REPLY
0
Entering edit mode

Now, there are 3 ways that I consider to use the reference genome. I'm really confused right now.

  1. From HISAT2 website. "tran" version (contains GTF) as @Carlo Yague mentioned. [It is not updated, last version from March 2016, I prefer to use the updated one.]
  2. From GENCODE website. as @krushnach80 mentioned. Release 28 (GRCh38.p12), GTF file at the first section gencode.v28.annotation.gtf, FASTA file which contains all chromosomes (Genome sequence (GRCh38.p12): GRCh38.p12.genome.fa)
  3. From Ensembl website. GTF and FASTA files from this link. GTF: release 93, FASTA: downloading all chromosomes separately (One of my friends use this way for the reference genome of mouse) or downloading toplevel one (My another friend was unable to index this one.)

How should I evaluate these options? HISAT2 option seems the most comfortable one, but I don't want to use outdated one.

ADD REPLY
1
Entering edit mode

Hi Batu,

I am stuck in the same situation as yours and surprised how come no one else is reporting this incompatibility. I am using HISAT2 GrCh38 release 84 version and using same version of GrCh38.84 version of gtf from Ensembl to be sure. But it is so outdated as the latest version now is 97! Please update the thread if you have found a solution.

Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 843 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6