Dna-Seq Multiple Mapped Tags
2
1
Entering edit mode
11.4 years ago
KCC ★ 4.1k

During a typical DNA-seq experiment, tags mapped to multiple positions are filtered out. What are the pluses and minuses of taking each multiple-mapped tag, counting the number of positions it maps to and adding (1/(# of mappings)) to all the places where the tag matches? So, if the tag maps to two positions, one adds 0.5 to each instead of the usual +1 to the mapped

Thus, I am showing the probability that the tagged mapped to that position. Has anybody ever tried this?

The reason I care about this is I want to examine behavior of my tags in hard to map regions such as repeats. If I follow the normal procedures then I get close to no information about these areas. I only need to get as granular as 500-1000bp so not overly precise.

• 1.9k views
ADD COMMENT
2
Entering edit mode
11.4 years ago
JC 13k

What are the pluses and minuses of taking each multiple-mapped tag?

You need to be confident where your sequence comes from before doing statistical analysis, of course you can try to use that method and check for variations in your datasets, just be careful with your statistics.

But with technologies pushing longer high quality reads, the proportion of multi-mapped reads will be irrrelevant soon.

Thus, I am showing the probability that the tagged mapped to that position. Has anybody ever tried this?

Yes, check the literature, this strategy has been used in DNAseq and RNAseq (sorry but I don't have the references in my head right now). Some tools even use this method for initial step before redistributing the values in iterative steps, check Cufflinks related papers.

ADD COMMENT
0
Entering edit mode

If you have the time, I'd really appreciate reference or a more specific hint of what to search for. I can follow this stuff up myself if I know roughly where to look. There are so many papers on DNA-seq and RNA-seq.

ADD REPLY
2
Entering edit mode
11.4 years ago
Fidel ★ 2.0k

What Salzberg endorses in this paper (http://www.nature.com/nrg/journal/v13/n1/full/nrg3117.html) is to map a multi-read randomly to one of the mapping positions. This has the following advantages:

  • Is faster compared to mapping all positions the read may map to.
  • The output mapping file does not need any extra processing besides the usual pipelines.
  • Is built-in in some of the aligment software.

Regarding your proposed method I have mostly seen in applied in the context of RNA-seq. Here is a paper on the topic: http://bioinformatics.oxfordjournals.org/content/26/4/493.long

ADD COMMENT
0
Entering edit mode

Great. This was really helpful.

ADD REPLY

Login before adding your answer.

Traffic: 1960 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6