Question

Building a new index for HISAT2 for RNA-Seq data analysis

0

Entering edit mode

2.4 years ago

Archit • 0

Hi, my project requires me to analyze the differential expression of mRNAs, lncRNAs, and miRNAs. So to align the raw fastq files I have, I need to align the reads to a reference genome. For the same, HISAT2 provides its own pre-built indexes based on GRCh38 (ensembl release 84 or something). I feel like it is old at this time and I want to use ensembl release 105. Now I am unsure about which file to use to build a new index. I was previously using genome_tran index available at http://daehwankimlab.github.io/hisat2/download/. So if I am to make my own index, which file should I use? (from ensembl release 105 http://ftp.ensembl.org/pub/release-105/fasta/homo_sapiens/)

cdna/Homo_sapiens.GRCh38.cdna.abinitio.fa.gz
cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
dna_index/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

Also, can someone provide me the full codes required to build it.

Much appreciated, Thank you!

HISAT data analysis NGS RNASeq RNA-Seq • 1.6k views

ADD COMMENT • link 2.4 years ago by Archit • 0

score 1 · Answer 1 · 2021-12-24

1

Entering edit mode

2.4 years ago

GenoMax 142k

GRCh38 is the current human genome build. It is not old. Ensembl releases have nothing to do human genome builds which are managed by GENCODE.

If you want to build the index yourself don't use the toplevel genome file: Why is human genome FASTA file on GENCODE much smaller than that on ENSEMBL?

ADD COMMENT • link 2.4 years ago by GenoMax 142k

0

Entering edit mode

So, the default genome_tran index for hisat2, which is based on ensembl's release 84 be good and relevant enough?

ADD REPLY • link 2.4 years ago by Archit • 0