hisat 2 indexing problem
1
0
Entering edit mode
3.5 years ago
modarzi ▴ 140

Hi, For alignment, I want to use HISAT2. so for indexing, I download "genome_tran(4.2 GB)" file from indexes section of HISAT 2 website. after download, I extracted that and this folder includes 10 files that they are:

genome_tran.1.ht2 , genome_tran.2.ht2, genome_tran.3.ht2, genome_tran.4.ht2, genome_tran.5.ht2, genome_tran.6.ht2, genome_tran.7.ht2, genome_tran.8.ht2, hg38_ucsc.annotated.gtf, make_h38_tran.sh.

Now, for alignment of my 8 RNA-seq files, I have to use below command based on hisat2 manual:

hisat2 [options]* -x <hisat2-idx> {-1 <m1> -2 <m2> | -U <r> | —sra-acc <SRA accession number>} [-S <hit>]

my problem is that I dont know which file from "genome_tran(4.2 GB)" must use instead of <hisat2-idx> in hisat2 command. "genome_tran.1.ht2" or "genome_tran.2.ht2" or "genome_tran.3.ht2", or ....?

I appreciate if anybody share his/her comment with me. Best Regards, Mohammad

RNA-Seq alignment next-gen HISAT 2 Indexing • 1.8k views
ADD COMMENT
2
Entering edit mode

You need the base name, so genome_tran.

ADD REPLY
2
Entering edit mode
3.5 years ago

You won't use any particular file, just genome_tran, which is the base of all of the file names. Hisat2 will then know to append .1.ht2 through .8.ht2 to that when it's loading things.

ADD COMMENT
0
Entering edit mode

Thanks for your comment. I run hisat2 for one of my sample(RNA-seq) but I receive lots of Warning. You can see the result of this analysis:

Warning: skipping read 'SRR1427482.47940377.1 FCD0EFCABXX:5:1208:6170:183143 length=49' because length (1) <= # seed mismatches (0)

Warning: skipping read 'SRR1427482.47940377.1 FCD0EFCABXX:5:1208:6170:183143 length=49' because it was < 2 characters long

46943435 reads; of these:
  46943435 (100.00%) were unpaired; of these:
    9396529 (20.02%) aligned 0 times
    12967404 (27.62%) aligned exactly 1 time
    24579502 (52.36%) aligned >1 times
79.98% overall alignment rate.

I don't know this result is good or not.I use hg38_tran for indexing.Is this result will be change If I use hg19 as reference? and my second problem is that I don't know this sample belong to the which area of Genome. I appreciate if you share your comment with me.

Best Regards,

Mohammad

ADD REPLY
0
Entering edit mode

50% multimapping seems very high, but perhaps these are ribo-depleted samples.

The sample is the entire transcriptome.

ADD REPLY

Login before adding your answer.

Traffic: 3110 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6