Question

hisat 2 indexing problem

0

Entering edit mode

7.2 years ago

modarzi ▴ 170

Hi, For alignment, I want to use HISAT2. so for indexing, I download "genome_tran(4.2 GB)" file from indexes section of HISAT 2 website. after download, I extracted that and this folder includes 10 files that they are:

genome_tran.1.ht2 , genome_tran.2.ht2, genome_tran.3.ht2, genome_tran.4.ht2, genome_tran.5.ht2, genome_tran.6.ht2, genome_tran.7.ht2, genome_tran.8.ht2, hg38_ucsc.annotated.gtf, make_h38_tran.sh.

Now, for alignment of my 8 RNA-seq files, I have to use below command based on hisat2 manual:

hisat2 [options]* -x <hisat2-idx> {-1 <m1> -2 <m2> | -U <r> | —sra-acc <SRA accession number>} [-S <hit>]

my problem is that I dont know which file from "genome_tran(4.2 GB)" must use instead of <hisat2-idx> in hisat2 command. "genome_tran.1.ht2" or "genome_tran.2.ht2" or "genome_tran.3.ht2", or ....?

I appreciate if anybody share his/her comment with me. Best Regards, Mohammad

RNA-Seq alignment next-gen HISAT 2 Indexing • 3.1k views

ADD COMMENT • link updated 7.2 years ago by Devon Ryan 105k • written 7.2 years ago by modarzi ▴ 170

2

Entering edit mode

You need the base name, so genome_tran.

ADD REPLY • link 7.2 years ago by ATpoint 88k

Devon Ryan · Answer 1 · 2018-05-04

2

Entering edit mode

7.2 years ago

Devon Ryan 105k

You won't use any particular file, just genome_tran, which is the base of all of the file names. Hisat2 will then know to append .1.ht2 through .8.ht2 to that when it's loading things.

ADD COMMENT • link 7.2 years ago by Devon Ryan 105k

0

Entering edit mode

Thanks for your comment. I run hisat2 for one of my sample(RNA-seq) but I receive lots of Warning. You can see the result of this analysis:

Warning: skipping read 'SRR1427482.47940377.1 FCD0EFCABXX:5:1208:6170:183143 length=49' because length (1) <= # seed mismatches (0)

Warning: skipping read 'SRR1427482.47940377.1 FCD0EFCABXX:5:1208:6170:183143 length=49' because it was < 2 characters long

46943435 reads; of these:
  46943435 (100.00%) were unpaired; of these:
    9396529 (20.02%) aligned 0 times
    12967404 (27.62%) aligned exactly 1 time
    24579502 (52.36%) aligned >1 times
79.98% overall alignment rate.

I don't know this result is good or not.I use hg38_tran for indexing.Is this result will be change If I use hg19 as reference? and my second problem is that I don't know this sample belong to the which area of Genome. I appreciate if you share your comment with me.

Best Regards,

Mohammad

ADD REPLY • link updated 7.2 years ago by Devon Ryan 105k • written 7.2 years ago by modarzi ▴ 170

0

Entering edit mode

50% multimapping seems very high, but perhaps these are ribo-depleted samples.

The sample is the entire transcriptome.

ADD REPLY • link 7.2 years ago by Devon Ryan 105k