Building a new index for HISAT2 for RNA-Seq data analysis
1
0
Entering edit mode
2.4 years ago
Archit • 0

Hi, my project requires me to analyze the differential expression of mRNAs, lncRNAs, and miRNAs. So to align the raw fastq files I have, I need to align the reads to a reference genome. For the same, HISAT2 provides its own pre-built indexes based on GRCh38 (ensembl release 84 or something). I feel like it is old at this time and I want to use ensembl release 105. Now I am unsure about which file to use to build a new index. I was previously using genome_tran index available at http://daehwankimlab.github.io/hisat2/download/. So if I am to make my own index, which file should I use? (from ensembl release 105 http://ftp.ensembl.org/pub/release-105/fasta/homo_sapiens/)

  1. cdna/Homo_sapiens.GRCh38.cdna.abinitio.fa.gz
  2. cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
  3. dna_index/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

Also, can someone provide me the full codes required to build it.

Much appreciated, Thank you!

HISAT data analysis NGS RNASeq RNA-Seq • 1.6k views
ADD COMMENT
1
Entering edit mode
2.4 years ago
GenoMax 142k

GRCh38 is the current human genome build. It is not old. Ensembl releases have nothing to do human genome builds which are managed by GENCODE.

If you want to build the index yourself don't use the toplevel genome file: Why is human genome FASTA file on GENCODE much smaller than that on ENSEMBL?

ADD COMMENT
0
Entering edit mode

So, the default genome_tran index for hisat2, which is based on ensembl's release 84 be good and relevant enough?

ADD REPLY

Login before adding your answer.

Traffic: 2322 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6