Length Count

Question

High rate of "Aligned concordantly >1 times:"

0

Entering edit mode

3.8 years ago

j.m.fustin27 • 0

Hello!

I have tried to look for existing answers but did not really find any satisfying one. I am running Hisat2 on the existing mouse genome (gencode.VM25) with paired reads and got this results:

HISAT2 summary stats:

Total pairs: 41539152
    Aligned concordantly or discordantly 0 time: 4746559 (11.43%)
    Aligned concordantly 1 time: 13453617 (32.39%)
    Aligned concordantly >1 times: 23240434 (55.95%)
    Aligned discordantly 1 time: 98542 (0.24%)
Total unpaired reads: 9493118
    Aligned 0 time: 6292703 (66.29%)
    Aligned 1 time: 990850 (10.44%)
    Aligned >1 times: 2209565 (23.28%)
Overall alignment rate: 92.43%

I am worried about the Aligned concordantly >1 times: 23240434 (55.95%). It seems awfully high. Whether I use trimmomatic for trimming or not does not matter I get the same rate. The size distribution of my reads is this (fastQC):

Length Count

35 47812.0; 36 49054.0; 37 52457.0; 38 55554.0; 39 54966.0; 40 58943.0; 41 62925.0; 42 53991.0; 43 58224.0; 44 63050.0; 45 53349.0; 46 55182.0; 47 51612.0; 48 58391.0; 49 53000.0; 50 57727.0; 51 54120.0; 52 54592.0; 53 63569.0; 54 57556.0; 55 53580.0; 56 58251.0; 57 56248.0; 58 53003.0; 59 56622.0; 60 60135.0; 61 54936.0; 62 57950.0; 63 56381.0; 64 56827.0; 65 62153.0; 66 61679.0; 67 61212.0; 68 66007.0; 69 65395.0; 70 70038.0; 71 120537.0; 72 262637.0; 73 1021662.0; 74 3263828.0; 75 9960097.0; 76 2.48439E7;

Any recommandations? Should I drop all reads smaller than 75 or something?

Thank you very much for any tip in advance!

Jean-Michel Fustin

Hisat2 Trimmomatic • 3.9k views

ADD COMMENT • link 3.8 years ago by j.m.fustin27 • 0

2

Entering edit mode

Looks like your reads are multi-mapping. I assume this is RNAseq data? Have you checked to see if you have rRNA contamination in your reads?

ADD REPLY • link 3.8 years ago by GenoMax 142k

0

Entering edit mode

If you drop all short reads, you will drop a whole lot that mapped fine. Your data is what it is. You probably can't fix it, all you can do is understand it.

My lab is cheap, and we do single end 50-bp runs all the time on mouse RNA, and I get 70-80% unique reads. So dropping 75-bp read which have a paired mate is not going to fix anything.

ADD REPLY • link 3.8 years ago by swbarnes2 14k

score 1 · Answer 1 · 2020-07-17

1

Entering edit mode

3.8 years ago

Shalu Jhanwar ▴ 520

Have you checked the length distribution of the reads? Probably better to drop very small reads (~20-30bp). Also, which aligner are you using? Have a look for strict parameters such as 'end-to-end' alignment, 'no of mismatched in the seed' can control multi mapping. You can also minimize the effect of multi-mappers after alignment by keeping uniquely aligned reads.

ADD COMMENT • link 3.8 years ago by Shalu Jhanwar ▴ 520

0

Entering edit mode

Dropping short reads might improve the metric of uniquely aligned reads, but it's not going to actually improve the data.

ADD REPLY • link 3.8 years ago by swbarnes2 14k

score 0 · Answer 2 · 2020-07-17

Apologies folks. It turns out I was using a wrong fasta file for the genome to align to (it was the file from Gencode containing only transcript sequences). When using the right genome it is now much better:

HISAT2 summary stats:

Total pairs: 25453457
    Aligned concordantly or discordantly 0 time: 1167561 (4.59%)
    Aligned concordantly 1 time: 20316643 (79.82%)
    Aligned concordantly >1 times: 3644554 (14.32%)
    Aligned discordantly 1 time: 324699 (1.28%)
Total unpaired reads: 2335122
    Aligned 0 time: 1146074 (49.08%)
    Aligned 1 time: 934697 (40.03%)
    Aligned >1 times: 254351 (10.89%)
Overall alignment rate: 97.75%

Thank you for your help everybody.