Low mapping rate for human NGS PE reads to hs37d5 genome
0
0
Entering edit mode
2.8 years ago
Ginsea Chen ▴ 130

Dear all.

I sequenced DNA samples of a human being by using NGS technology and mapping reads (length is 90bp) to the human genome (version:hs37d5). Now I detected so low mapping rate (normal sample is higher than 99%, while my value is 88%). I collected all unmapped reads (243118 reads, flag of bam is 0) and tried to find their origins, while I can' t find any hits in NCBI nr database and only 2430 reads contained index sequences and only 210 reads containing adapter sequences.

So, my question is how should I do to find any reason which causes this low mapping rate? If you have some suggestions, please tell me.

Thanks.

NGS human reads hs37d5 Low mapping rate • 1.6k views
0
Entering edit mode

Did you run fastqc to check if you might have carryover of adapters or other overrepresented sequences?

0
Entering edit mode

I have used cutadapter to cut adapter sequences and used our in house script to filter low-quality reads. I have never used fastqc. Thanks for you suggestion, I will try it.

0
Entering edit mode

I have used fastqc to treat all unmapped reads, and I get base sequence quality like: and get base sequence content like:

0
Entering edit mode

This unmapped data does not appear to be of great quality (median values around Q24 ). As others have said 88% is not bad alignment rate by any means. You may want to take some of the unmapped reads and blast them to see if they are contaminants.

0
Entering edit mode

I have mapped all unmapped reads to NCBI nr database and not find any matching record.

0
Entering edit mode

There is not much you can do in that case. These could simply be sequencing artifacts.

Note: Did you do a translated blast search since you mention nr? How about a blastn search with nt?

0
Entering edit mode

Agreed. Given that base quality the results appear to be fine, I've seen worse mapping rates. I suggest you proceed with downstream analysis and see if this goes without issues. If so, don't bother yourself with the mapping rate.

0
Entering edit mode

Thanks for @ATpoint and @genomax. These samples with low mapping rates have been analyzed, and we observed samples with mapping rate lower than 95% always contained some abnormal SNP/indel variations which around with may soft clip bases like follows.

Now, I am not sure there was a direct relationship among low mapping rates and much soft clip reads around snp/indel variations, while I always observed lots of SNP/indel variations around may soft clip bases in a sample which mapping rates lower than 95%.

0
Entering edit mode

I would say 88% is not actually very low, but within acceptable limits, we usually get 85-95%. But anyway, you also might try to run FASTQC on your raw data to see if you have any adapters or overrepresented sequences.

0
Entering edit mode

For samples which mapping rate lower than 95%, we will observe much soft clip reads around a SNP or indel variations like supplement figure(https://photos.app.goo.gl/FTxpyvn2qZJDGhnC8 ), So we think the unknown reason which causes low mapping rate may influence the accuracy of variations detecting in target samples.

0
Entering edit mode

The link is not functional. Please upload the image to a public image hoster such as ImgBB and then paste the full link including the prefix (e.g. .png) into the image field:

1
Entering edit mode

Please try again, I have fixed it