Low mapping rate, how can I handle it?
6 weeks ago
sskoldas • 0

Hello,

I am new in this field. I am doing metagenome analysis with shotgun reads. All reads are single ended. DNAs were obtained from airways of human. I just want to find taxon abundances in the samples. Then I will predict the diversities and core microbes.

My mapping results are terrible. How can I handle bad mappings?? OR should I change the tools that I used the analysis?? Which tools are more accurate or sensitive for microbiome analysis?? I need any suggestions, please!

I followed this pipeline:

1. Assembly was done using Megahit
2. Short contigs (<200 bps) were removed using prinseq
3. Read mapping against contigs was performed using BWA
4. Similarity searches for GenBank, KEGG, eggNOG were done using Diamond
5. Binning was done using MaxBin2

This is my mapping results:

  # Sample  Total reads Mapped reads    Mapping perc    Total bases
samples13   21380728    17881628    83.63   1618006383
samples14   109599  22051   20.12   7606328
samples15   258752  119090  46.02   18803788
samples16   340586  147490  43.30   24935657
samples12   7342679 6205921 84.52   524794709
samples11   7741157 6283578 81.17   554721680
samples17   17108901    15213361    88.92   1294292384
samples18   4012626 2850684 71.04   302834087

My mapping results are terrible.

What makes you say that? I mostly see samples with >70% alignment rates, which is fine.

Friederike Each sample belongs to a different person. Mapping percentage of sample 14,15 and 16 are under the 50%. They are really bad for me, especially sample 14. I don't get any healthy information from those samples.

Have you actually looked at the results? What type of information are you missing from the results? What exactly is throwing you off? (Not saying that these samples didn't fail, but in order to get a sense of why they may have failed, we need more information)

Friederike In the nutshell, I should make a comparison between samples like (11-12) , (13-14), (15-16), (17-18).
for example; in the image (assume the samples are ordered 11 12 13...), we expect equivalent level of sample 13 and 14, but sample 14 has very low abundance compare to sample 13. I'm not sure, but this situation depends on sample amount, right? If so, is it possible to normalize with these results and calculate diversity and core microbes? I need relative abundance For this, right?

I need relative abundance For this, right

yes

6 weeks ago
Mensur Dlakic ★ 15k

I think your results are mostly fine. As already mentioned, in most of them you have a high mapping rate. In those where the mapping is low, it seems that it is matched by a low number of reads. That is most likely because of low depth or abundance, and leads to fragmented assemblies. I suggest you look at your assembly statistics and eliminate those samples where most of the contigs are short, and probably those with low sequencing depth. That should leave you with good samples.

If I can make another suggestion: try to go with more informative titles such as "Low mapping rate", as they will attract more people to read and potentially answer. I think the idea is to tell us something specific about the problem so that people who are interested in that subject matter will comment.

