Reason for multiple mapping of reads ?
1
0
Entering edit mode
7.0 years ago
bioinfo_ga ▴ 70

I have 75x2 (20.77) million reads for Acinetobacter_baumannii sample . I aligned the reads to the reference genome (Acinetobacter_baumannii downloaded from NCBI which has chromosome and two plasmid sequences) with end to end option(bowtie2 program) and got 86.58% reads which aligned concordantly >1 times.PCR duplicates in the sample are 73%, I also aligned only the pcr good reads with the reference and got 70.91% reads which aligned concordantly >1 times.And i checked the genome for repeats and observed less repeats in the dot plot , so what could be the reason for multiple mapping ?

rna-seq • 2.3k views
ADD COMMENT
2
Entering edit mode
7.0 years ago
Titus ▴ 910

Hi , The reason is homologous/pseudo genes, or conserved domains i think, isn't it ? Best

ADD COMMENT
0
Entering edit mode

Moved to an answer since this is very likely the reason.

ADD REPLY
0
Entering edit mode

Thanks , it's just English is not my origin language and it's difficult to be sure to understand a question. I got a question for you ( i will remove this post after your answer ) what can i do respecting the community rules to actualize one of my question posted ? (which i posted a Friday that's hope i got no answer)

ADD REPLY
1
Entering edit mode

If you have relevant new information for that question you can edit your question, which will also bump it to the top of the list and get attention again. Just don't abuse this feature.

ADD REPLY
0
Entering edit mode

I am under the impression that pseudogenes are rare in bacteria, since they are under high pressure to keep a small genome...

86% of reads with multiple alignments seems incredibly high to be explained by pseudogenes/repeats/etc; I've never seen that in bacteria. I think there's a different mechanism at work here. Perhaps you could post your insert size distribution and coverage distribution? If you had super-short inserts so that most of the reads were only 14bp after adapter-trimming, that would explain the issue since you simply don't have enough information per read to map them correctly. Alternatively, if your coverage distribution indicates that most of your reads cover a small fraction of the genome (which might be repetitive), which is something that could theoretically happen with (for example) MDA-amplified single cells, then pseudogenes/repeats could be the correct explanation. However, I doubt that this is the correct explanation if you are using a randomly-fragmented library from an unamplified isolate.

It is also helpful to post the percentage of reads that map.

ADD REPLY
0
Entering edit mode

I checked for pseudo genes(homologous) but did not find them , could there be any other possible reason for it ?

ADD REPLY
1
Entering edit mode

How did you do that ? did you check the second mapping site for a read which map 2 times ?

ADD REPLY

Login before adding your answer.

Traffic: 2153 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6