Question: I got extremely low alignment rate running HiSAT2 and Tophat2
gravatar for oghzzang
2.8 years ago by
oghzzang40 wrote:

Hi. I'm trying to map paired-end rna-seq reads on GRCm38 (mm10) using Hisat2 and Tophat2. But the mapping percentage is almost 0-5%

(hi-seq 2500 and sequencing fragment is 300 bp)


1) fastqc summary

PASS Basic Statistics

PASS Per base sequence quality

PASS Per tile sequence quality

PASS Per sequence quality scores

FAIL Per base sequence content (file openenter image description here or like this image : PASS Per sequence GC content

PASS Per base N content

PASS Sequence Length Distribution

FAIL Sequence Duplication Levels

PASS Overrepresented sequences

PASS Adapter Content

2) read information

Measure Value

Filename sample_1.fastq.gz

File type Conventional base calls

Encoding Sanger / Illumina 1.9

Total Sequences 44728504

Sequences flagged as poor quality 0

Sequence length 101

%GC 50


1) command


-p 8\

--rg-id=sample \

--rg SM:sample --rg LB:LB --rg PL:Illumina --rg PU:sample\

-x $Reference_dir/Mus_musculus/NCBI/hisatIndex/GRCm38\

--dta \

--rna-strandness FR\

-1 $Fastq_dir/sample_1.fastq.gz\

-2 $Fastq_dir/sample_2.fastq.gz\

-S $Working_dir/Analysis/$Analysis_dir/NCBI/Pre_Tophat/sample_pe.sam 2

2) Result

44728504 reads; of these:

44728504 (100.00%) were paired; of these:

44358669 (99.17%) aligned concordantly 0 times

331704 (0.74%) aligned concordantly exactly 1 time

38131 (0.09%) aligned concordantly >1 times


44358669 pairs aligned concordantly 0 times; of these:

  11328 (0.03%) aligned discordantly 1 time


44347341 pairs aligned 0 times concordantly or discordantly; of these:

  88694682 mates make up the pairs; of these:

    87830960 (99.03%) aligned 0 times

    735195 (0.83%) aligned exactly 1 time

    128527 (0.14%) aligned >1 times

1.82% overall alignment rate


1) command**


 --GTF $Reference_dir//Mus_musculus/UCSC/mm10/Annotation/Archives/archive-2015-07-17-14-33-26/Genes/genes.gtf\ ## from

 --output-dir $Working_dir/Analysis/$Analysis_dir/Tophat\

 --num-threads 1\

 $Reference_dir/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome\ ## from ## from



2) result

Left reads:

      Input     :  44728504

       Mapped   :    355987 ( 0.8% of input)

        of these:      7756 ( 2.2%) have multiple alignments (0 have >20)

Right reads:

      Input     :  44728504

       Mapped   :    347193 ( 0.8% of input)

        of these:      7342 ( 2.1%) have multiple alignments (0 have >20)

0.8% overall read mapping rate.

Aligned pairs: 159136

 of these:      1209 ( 0.8%) have multiple alignments

                 218 ( 0.1%) are discordant alignments

0.4% concordant pair alignment rate.

  1. Other try..

1) first 10 bp trimming from fastq read 1 and read 2 files.

--> But the result was also too extremely low rate alignment.

2) I've been seen this comment.

Reference speices diverse

rna-seq alignment mm10 • 2.0k views
ADD COMMENTlink modified 2.8 years ago by Carlo Yague5.5k • written 2.8 years ago by oghzzang40

I have the same problem! have you downloaded the index from HISAT2? I did, even trying with mm9 I get the same alignment rate, I am using public NGS data :( which it is suposed to be mouse!...

ADD REPLYlink written 2.8 years ago by Buffo1.8k

Did you check your data source? I checked my data. And I identified my data wasn't mouse sequence. (by Carlo Yague's comment)

After I map my data to human reference, I got 95%. mapping percentage. And I ran hisat index following pipelines.

  1. Download Reference genome

  2. Build hisat2 index echo "2-1. Build Hisat2 index (Default Options)" $AnacondaBin/hisat2-build\ $Reference_dir/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/hisat2_index/mm10_genome.fa\ mm10_genome

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by oghzzang40

Yes of course, I have mapped the data to some related genomes including human, finally I will write to the corresponding author :).

ADD REPLYlink written 2.8 years ago by Buffo1.8k
gravatar for Carlo Yague
2.8 years ago by
Carlo Yague5.5k
Carlo Yague5.5k wrote:

GIven the extremely low mapping rate, my guess would be that your data is not mouse RNA. You can try to manually pick a few reads and blast them.

By the way, where does your data comes from ?

It is also quite unusual to not have overrepresented sequences in RNA-seq data.

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Carlo Yague5.5k

Hi Yague, thank you so much for getting back. your comment definitely helps a lot! My RNA sequence file is come from illumina hiseq 2500 platform and Human sample :).

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by oghzzang40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2545 users visited in the last hour