Question: Compatibility between my reference genome and GTF file
1
gravatar for Batu
8 months ago by
Batu150
Batu150 wrote:

Hello,

I'm new to bioinformatics, I have a question which may seem easy. For my RNA-seq analyses, I was using HISAT2 with already-indexed reference genome downloaded from HISAT2 website for Homo sapiens. And as GTF I was using the one I downloaded from Ensembl. After some runs, I thought this usage might be wrong, because I downloaded reference genome and GTF from different sources (HISAT2 website does not have GTF) and they might not be matched. So I've decided to download a new reference genome from Ensembl and index it myself, and using the corresponding GTF file from Ensembl with the right version number (v93). Is there any incompatibility between my previous GTF Homo_sapiens.GRCh38.92.gtf and reference genome? How should I use them?

ADD COMMENTlink modified 8 months ago • written 8 months ago by Batu150
1

Always use the same version of the genome and the annotation.

ADD REPLYlink modified 8 months ago • written 8 months ago by Juke-342.2k
1

simple suggestion go for gencode you can find the genome assembly and gtf file at the same place of various corresponding versions

ADD REPLYlink written 8 months ago by krushnach80500
1

Usually for any incompatibility, HISAT2 report warning or errors. But as juke-34 pointed out correctly, it is always advisable to use same version of genome and annotation file to avoid any further error.

ADD REPLYlink written 8 months ago by toralmanvar760
1

The "tran" versions of the indexed genomes on the HISAT2 website contains the GTF annotation files. Its probably easier to use these. Link to download GRCh38 index + GTF.

ADD REPLYlink modified 8 months ago • written 8 months ago by Carlo Yague4.5k

It is really easy to use them, but isn't it outdated? (It was released more than 2 years ago according to its date on ftp)

ADD REPLYlink written 8 months ago by Batu150

Now, there are 3 ways that I consider to use the reference genome. I'm really confused right now.

  1. From HISAT2 website. "tran" version (contains GTF) as @Carlo Yague mentioned. [It is not updated, last version from March 2016, I prefer to use the updated one.]
  2. From GENCODE website. as @krushnach80 mentioned. Release 28 (GRCh38.p12), GTF file at the first section gencode.v28.annotation.gtf, FASTA file which contains all chromosomes (Genome sequence (GRCh38.p12): GRCh38.p12.genome.fa)
  3. From Ensembl website. GTF and FASTA files from this link. GTF: release 93, FASTA: downloading all chromosomes separately (One of my friends use this way for the reference genome of mouse) or downloading toplevel one (My another friend was unable to index this one.)

How should I evaluate these options? HISAT2 option seems the most comfortable one, but I don't want to use outdated one.

ADD REPLYlink modified 8 months ago • written 8 months ago by Batu150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 961 users visited in the last hour