Choosing human reference genome and indexing
3.0 years ago
Batu ▴ 220

I have a difficulty to choose the suitable source for me. There are 3 ways that I consider to use the reference genome. I'm really confused.

1. From HISAT2 website. "tran" version (contains GTF) as @Carlo Yague mentioned in my previous post. [It is not updated, last version from March 2016, I prefer to use the updated one.]
2. From GENCODE website. as @krushnach80 mentioned in my previous post. Release 28 (GRCh38.p12), GTF file at the first section gencode.v28.annotation.gtf, FASTA file which contains all chromosomes (Genome sequence (GRCh38.p12): GRCh38.p12.genome.fa)
3. From Ensembl website. GTF and FASTA files from this link. GTF: release 93, FASTA: downloading all chromosomes separately (One of my friends use this way for the reference genome of mouse) or downloading toplevel one (My another friend was unable to index this one.)

How should I evaluate these options? HISAT2 option seems the most comfortable one, but I don't want to use outdated one.

reference genome indexing RNA-Seq • 1.1k views
Hi Batu, what did you finally use then as the GTF annotation? I am still planning to stick to point 1.

3.0 years ago

The GENCODE and Ensembl GTF files are almost completely identical. Sometimes there's a difference in the chromosome naming system used, but that's largely it. Ensure you use fasta file from the same source (other than possible differences in chromosome names, the Ensembl and Gencode fasta files will be the same). So options 2 and 3 are effectively the same. I guess Ensembl has a quicker release cycle, so they're a bit more likely to fix any issues in their GTF files.

Entering edit mode

