Question

I got low overall alignment rate running HiSAT2

0

Entering edit mode

7.7 years ago

sslee1015 • 0

Hi, I'm fairly new to RNA Seq, and I don't really know how to explain these results.

Here is a generic sample of my HiSAT2 code:

hisat2 -x mm10/genome -1 sample1_R1.fastq -2 sample1_R2.fastq -S sample1out.sam

The genome reference I used is mouse, mm10, and the directory contains .ht2 files. sample1_R1.fastq is one of the paired end reads, and sample1_R2.fastq is the other. For sample 1, I received 8 different fastq files, 4 of them from R1 and the other 4 R2, so I concatenated the R1's and R2's into the fastq files I input into hisat2. This was my hisat2 summary:

32832172 reads; of these:
  32832172 (100.00%) were paired; of these:
    32326312 (98.46%) aligned concordantly 0 times
    393332 (1.20%) aligned concordantly exactly 1 time
    112528 (0.34%) aligned concordantly >1 times
    ----
    32326312 pairs aligned concordantly 0 times; of these:
      6101 (0.02%) aligned discordantly 1 time
    ----
    32320211 pairs aligned 0 times concordantly or discordantly; of these:
      64640422 mates make up the pairs; of these:
        64313845 (99.49%) aligned 0 times
        208508 (0.32%) aligned exactly 1 time
        118069 (0.18%) aligned >1 times
2.06% overall alignment rate

2.06% seems really low. Did I do something wrong?

RNA-Seq hisat hisat2 alignment • 6.9k views

ADD COMMENT • link 7.7 years ago by sslee1015 • 0

1

Entering edit mode

Something should be terribly wrong. 32 million read pairs and 0.39 million mapped ? Post the fastqc report.

ADD REPLY • link 7.7 years ago by GouthamAtla 12k

2

Entering edit mode

The sample might not be what you think it is, so you may be aligning to the wrong genome. Try blasting a few of the unmapped reads.

ADD REPLY • link 7.7 years ago by mastal511 ★ 2.1k

3

Entering edit mode

Fastq Screen should provide you with a quick and easy way of telling what genome a sequence file comes from.

ADD REPLY • link 7.7 years ago by i.sudbery 19k

0

Entering edit mode

Here are screenshots of most of the FastQC graphs

ADD REPLY • link 7.7 years ago by sslee1015 • 0

1

Entering edit mode

You have some serious problems at the 5' end of your reads. The first 3 bases are 100% GC and 70% G. A disturbed GC profile pretty normal at the start of an RNA-seq read, but I've never seen anything this extreme before. If you look at your enriched Kmers, you'll see a massive enrichment for all the different homo-polymer runs, in particular homo-G at the start.

You could try clipping off the first 10 bases of the read or so, and see if that helps, but i'd be a bit nervous because you don't know the cause. There is definitely either something wrong with the libraries, or something wrong with the sequencing. I would contact your sequencing company and discuss it with them.

ADD REPLY • link 7.7 years ago by i.sudbery 19k

1

Entering edit mode

Did you trim of adapters and polyA tails? Which sequencer? What is your read length?