Question: Keeping Tags That Are Mapped To The Same Place By Two Aligners
1
gravatar for KCC
7.0 years ago by
KCC4.0k
Cambridge, MA
KCC4.0k wrote:

I have been thinking for a while about the advantages of doing something like aligning with bowtie2 and bwa and then keeping only the tags while are aligned to the same positions by both programs.

One reservation I have about this idea is I don't think I've ever seen it anywhere and since it's a somewhat simple thing to do, I was wondering if anybody has tried it and what are the pluses and minuses to an approach like this?

alignment bwa bowtie • 1.4k views
ADD COMMENTlink modified 2.5 years ago by Biostar ♦♦ 20 • written 7.0 years ago by KCC4.0k
2
gravatar for Gabriel R.
7.0 years ago by
Gabriel R.2.6k
Danmarks Tekniske Universitet
Gabriel R.2.6k wrote:

My guess is that you would downsample your set towards reads with low divergence to the reference. Those reads are probably those will little edit distance. Try the following:

1) Keep 3 sets a) BWA intersect with Bowtie b) BWA minus Bowtie c) Bowtie minus BWA 2) Measure the average edit distance for each (using the NM field).

Post the results for fun :-)

ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by Gabriel R.2.6k

Good suggestion, I'd be interested in seeing the results as well

ADD REPLYlink written 7.0 years ago by Mikael Huss4.7k
0
gravatar for John
2.5 years ago by
John12k
Germany
John12k wrote:

That is a thing, although not to the read-level. I've seen it used in bisulphite mapping where Bismark, PASH, and one other (but i forget the name) where used to map reads and the combined methylation calls were used.

The problem is that it doesn't always statistically make sense. It depends on the power/sensitivity of the tools, which is often unknown, and the size of the data. To help explain this a bit better i wrote you a little tool - although it might have a bug. i dont know. it's a sunday and i seem to write the most bugs on sundays.

gist.github.com/JohnLonginotto/bd1a846544b31ebe6f96190c1cd84a83

And you provide it arguments like:

python ./should_i_intersect_it.py --true_positives 100 --true_negatives 60000 --true_positives_A 34 --false_positives_A 10 --true_positives_B 40 --false_positives_B 13

And then it will tell you True Positive, True Negative, False Positive, and False Negative for each tool, and the intersect/union:

                 Tool A                                   Tool B                                 Intersect                                  Union
       TP      TN      FP      FN               TP      TN      FP      FN               TP      TN      FP      FN               TP      TN      FP      FN
      34.0    59990.  10.0    66.0             40.0    59987.  13.0    60.0             13.62   59977.  0.0     39.62            60.38   60000.  23.0    86.38
      34.0    59990.  10.0    66.0             40.0    59987.  13.0    60.0             13.16   59977.  0.0     39.16            60.84   60000.  23.0    86.84
      34.0    59990.  10.0    66.0             40.0    59987.  13.0    60.0             13.74   59977.  0.0     39.74            60.26   60000.  23.0    86.26
      34.0    59990.  10.0    66.0             40.0    59987.  13.0    60.0             13.1    59977.  0.0     39.1             60.9    60000.  23.0    86.9
      34.0    59990.  10.0    66.0             40.0    59987.  13.0    60.0             13.82   59977.  0.02    39.82            60.18   59999.  22.98   86.18

The above example values i chose because that sort-of reflects RNA-seq differentially expressed genes. Where you have 100 truly DE genes, 60000 genes that don't actually change, and then two tools, A and B, where A gets 34 right but 10 wrong, and B gets 40 right but 13 wrong.

The result is that neither the intersect nor the union look particularly appealing. It depends on what you're after really. There are values where an intersect or union look very appealing, but meh. Play with the values to figure it out :)

Also run it in pypy because i didn't optimise anything and it's real slow.

EDIT:

GAH Biostars bot you **er, i just burnt 2-3 hours writing a stupid tool for a 4.5 year old question D':

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by John12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1149 users visited in the last hour