Question: I got low overall alignment rate running HiSAT2
gravatar for sslee1015
3.6 years ago by
sslee10150 wrote:

Hi, I'm fairly new to RNA Seq, and I don't really know how to explain these results.

Here is a generic sample of my HiSAT2 code:

hisat2 -x mm10/genome -1 sample1_R1.fastq -2 sample1_R2.fastq -S sample1out.sam

The genome reference I used is mouse, mm10, and the directory contains .ht2 files. sample1_R1.fastq is one of the paired end reads, and sample1_R2.fastq is the other. For sample 1, I received 8 different fastq files, 4 of them from R1 and the other 4 R2, so I concatenated the R1's and R2's into the fastq files I input into hisat2. This was my hisat2 summary:

32832172 reads; of these:
  32832172 (100.00%) were paired; of these:
    32326312 (98.46%) aligned concordantly 0 times
    393332 (1.20%) aligned concordantly exactly 1 time
    112528 (0.34%) aligned concordantly >1 times
    32326312 pairs aligned concordantly 0 times; of these:
      6101 (0.02%) aligned discordantly 1 time
    32320211 pairs aligned 0 times concordantly or discordantly; of these:
      64640422 mates make up the pairs; of these:
        64313845 (99.49%) aligned 0 times
        208508 (0.32%) aligned exactly 1 time
        118069 (0.18%) aligned >1 times
2.06% overall alignment rate

2.06% seems really low. Did I do something wrong?

hisat2 rna-seq alignment hisat • 3.7k views
ADD COMMENTlink written 3.6 years ago by sslee10150

Something should be terribly wrong. 32 million read pairs and 0.39 million mapped ? Post the fastqc report.

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by geek_y10k

The sample might not be what you think it is, so you may be aligning to the wrong genome. Try blasting a few of the unmapped reads.

ADD REPLYlink written 3.6 years ago by mastal5112.0k

Fastq Screen should provide you with a quick and easy way of telling what genome a sequence file comes from.

ADD REPLYlink written 3.6 years ago by i.sudbery7.3k

Here are screenshots of most of the FastQC graphs

ADD REPLYlink written 3.6 years ago by sslee10150

You have some serious problems at the 5' end of your reads. The first 3 bases are 100% GC and 70% G. A disturbed GC profile pretty normal at the start of an RNA-seq read, but I've never seen anything this extreme before. If you look at your enriched Kmers, you'll see a massive enrichment for all the different homo-polymer runs, in particular homo-G at the start.

You could try clipping off the first 10 bases of the read or so, and see if that helps, but i'd be a bit nervous because you don't know the cause. There is definitely either something wrong with the libraries, or something wrong with the sequencing. I would contact your sequencing company and discuss it with them.

ADD REPLYlink written 3.6 years ago by i.sudbery7.3k

Did you trim of adapters and polyA tails? Which sequencer? What is your read length?

ADD REPLYlink written 3.6 years ago by WouterDeCoster43k

I believe the company that gave us the raw reads did that for me already.

ADD REPLYlink written 3.6 years ago by sslee10150
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1851 users visited in the last hour