Question

Reason for multiple mapping of reads ?

0

Entering edit mode

8.2 years ago

bioinfo_ga ▴ 70

I have 75x2 (20.77) million reads for Acinetobacter_baumannii sample . I aligned the reads to the reference genome (Acinetobacter_baumannii downloaded from NCBI which has chromosome and two plasmid sequences) with end to end option(bowtie2 program) and got 86.58% reads which aligned concordantly >1 times.PCR duplicates in the sample are 73%, I also aligned only the pcr good reads with the reference and got 70.91% reads which aligned concordantly >1 times.And i checked the genome for repeats and observed less repeats in the dot plot , so what could be the reason for multiple mapping ?

rna-seq • 3.0k views

ADD COMMENT • link 8.2 years ago by bioinfo_ga ▴ 70

score 2 · Answer 1 · 2017-04-27

2

Entering edit mode

8.2 years ago

Titus ▴ 910

Hi , The reason is homologous/pseudo genes, or conserved domains i think, isn't it ? Best

ADD COMMENT • link 8.2 years ago by Titus ▴ 910

0

Entering edit mode

Moved to an answer since this is very likely the reason.

ADD REPLY • link 8.2 years ago by Devon Ryan 105k

0

Entering edit mode

Thanks , it's just English is not my origin language and it's difficult to be sure to understand a question. I got a question for you ( i will remove this post after your answer ) what can i do respecting the community rules to actualize one of my question posted ? (which i posted a Friday that's hope i got no answer)

ADD REPLY • link 8.2 years ago by Titus ▴ 910

1

Entering edit mode

If you have relevant new information for that question you can edit your question, which will also bump it to the top of the list and get attention again. Just don't abuse this feature.

ADD REPLY • link 8.2 years ago by WouterDeCoster 48k

0

Entering edit mode

I am under the impression that pseudogenes are rare in bacteria, since they are under high pressure to keep a small genome...

86% of reads with multiple alignments seems incredibly high to be explained by pseudogenes/repeats/etc; I've never seen that in bacteria. I think there's a different mechanism at work here. Perhaps you could post your insert size distribution and coverage distribution? If you had super-short inserts so that most of the reads were only 14bp after adapter-trimming, that would explain the issue since you simply don't have enough information per read to map them correctly. Alternatively, if your coverage distribution indicates that most of your reads cover a small fraction of the genome (which might be repetitive), which is something that could theoretically happen with (for example) MDA-amplified single cells, then pseudogenes/repeats could be the correct explanation. However, I doubt that this is the correct explanation if you are using a randomly-fragmented library from an unamplified isolate.

It is also helpful to post the percentage of reads that map.

ADD REPLY • link 8.2 years ago by Brian Bushnell 20k

0

Entering edit mode

I checked for pseudo genes(homologous) but did not find them , could there be any other possible reason for it ?

ADD REPLY • link 8.2 years ago by bioinfo_ga ▴ 70

1

Entering edit mode

How did you do that ? did you check the second mapping site for a read which map 2 times ?

ADD REPLY • link 8.2 years ago by Titus ▴ 910