Question: Compatibility between my reference genome and GTF file
2
gravatar for Batu
14 months ago by
Batu170
Batu170 wrote:

Hello,

I'm new to bioinformatics, I have a question which may seem easy. For my RNA-seq analyses, I was using HISAT2 with already-indexed reference genome downloaded from HISAT2 website for Homo sapiens. And as GTF I was using the one I downloaded from Ensembl. After some runs, I thought this usage might be wrong, because I downloaded reference genome and GTF from different sources (HISAT2 website does not have GTF) and they might not be matched. So I've decided to download a new reference genome from Ensembl and index it myself, and using the corresponding GTF file from Ensembl with the right version number (v93). Is there any incompatibility between my previous GTF Homo_sapiens.GRCh38.92.gtf and reference genome? How should I use them?

ADD COMMENTlink modified 14 months ago • written 14 months ago by Batu170
1

Always use the same version of the genome and the annotation.

ADD REPLYlink modified 14 months ago • written 14 months ago by Juke-343.0k
1

simple suggestion go for gencode you can find the genome assembly and gtf file at the same place of various corresponding versions

ADD REPLYlink written 14 months ago by krushnach80630
1

Usually for any incompatibility, HISAT2 report warning or errors. But as juke-34 pointed out correctly, it is always advisable to use same version of genome and annotation file to avoid any further error.

ADD REPLYlink written 14 months ago by toralmanvar820
1

The "tran" versions of the indexed genomes on the HISAT2 website contains the GTF annotation files. Its probably easier to use these. Link to download GRCh38 index + GTF.

ADD REPLYlink modified 14 months ago • written 14 months ago by Carlo Yague4.8k

It is really easy to use them, but isn't it outdated? (It was released more than 2 years ago according to its date on ftp)

ADD REPLYlink written 14 months ago by Batu170

Now, there are 3 ways that I consider to use the reference genome. I'm really confused right now.

  1. From HISAT2 website. "tran" version (contains GTF) as @Carlo Yague mentioned. [It is not updated, last version from March 2016, I prefer to use the updated one.]
  2. From GENCODE website. as @krushnach80 mentioned. Release 28 (GRCh38.p12), GTF file at the first section gencode.v28.annotation.gtf, FASTA file which contains all chromosomes (Genome sequence (GRCh38.p12): GRCh38.p12.genome.fa)
  3. From Ensembl website. GTF and FASTA files from this link. GTF: release 93, FASTA: downloading all chromosomes separately (One of my friends use this way for the reference genome of mouse) or downloading toplevel one (My another friend was unable to index this one.)

How should I evaluate these options? HISAT2 option seems the most comfortable one, but I don't want to use outdated one.

ADD REPLYlink modified 14 months ago • written 14 months ago by Batu170
1

Hi Batu,

I am stuck in the same situation as yours and surprised how come no one else is reporting this incompatibility. I am using HISAT2 GrCh38 release 84 version and using same version of GrCh38.84 version of gtf from Ensembl to be sure. But it is so outdated as the latest version now is 97! Please update the thread if you have found a solution.

Thank you.

ADD REPLYlink written 4 months ago by Rituriya30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1691 users visited in the last hour