Choosing human reference genome and indexing
1
0
Entering edit mode
3.0 years ago
Batu ▴ 220

I have a difficulty to choose the suitable source for me. There are 3 ways that I consider to use the reference genome. I'm really confused.

1. From HISAT2 website. "tran" version (contains GTF) as @Carlo Yague mentioned in my previous post. [It is not updated, last version from March 2016, I prefer to use the updated one.]
2. From GENCODE website. as @krushnach80 mentioned in my previous post. Release 28 (GRCh38.p12), GTF file at the first section gencode.v28.annotation.gtf, FASTA file which contains all chromosomes (Genome sequence (GRCh38.p12): GRCh38.p12.genome.fa)
3. From Ensembl website. GTF and FASTA files from this link. GTF: release 93, FASTA: downloading all chromosomes separately (One of my friends use this way for the reference genome of mouse) or downloading toplevel one (My another friend was unable to index this one.)

How should I evaluate these options? HISAT2 option seems the most comfortable one, but I don't want to use outdated one.

reference genome indexing RNA-Seq • 1.1k views
1
Entering edit mode
0
Entering edit mode

Hi Batu, what did you finally use then as the GTF annotation? I am still planning to stick to point 1.

2
Entering edit mode
3.0 years ago

The GENCODE and Ensembl GTF files are almost completely identical. Sometimes there's a difference in the chromosome naming system used, but that's largely it. Ensure you use fasta file from the same source (other than possible differences in chromosome names, the Ensembl and Gencode fasta files will be the same). So options 2 and 3 are effectively the same. I guess Ensembl has a quicker release cycle, so they're a bit more likely to fix any issues in their GTF files.

2
Entering edit mode

is this incomplete post? @ Devon Ryan

0
Entering edit mode

No, I just like to end in the mid

Actually the ", but" should be a period :P

0
Entering edit mode

okay.. i am off the anxiety meds then..thanks.