How Do I Detect Multiple Mappings For A Single Read?
1
1
Entering edit mode
11.5 years ago
KCC ★ 4.1k

I have found the following ways to determine if a tag can be mapped to more than one place using a SAM file produced by bwa produced from single-read sequencing data:

  1. the presence of the characters "XT:A:R"

  2. the presence of the characters "XA:Z:"

  3. mapping quality (MAPQ) of 0.

For my data, I find these definitions all give close, but not identical counts. Why is that and which is the right definition?

sam • 6.5k views
ADD COMMENT
0
Entering edit mode

MAPQ==0 means a multiple hit

ADD REPLY
0
Entering edit mode

Thanks. It has been corrected in my question.

ADD REPLY
2
Entering edit mode
11.5 years ago

Actually, there is a Tag in the sam format for exactly this problem: "NH:i:xx". But, as far as I known, there is only one mapping tool, that uses it: segemehl

One simple (but not very efficient) way to get all multiply mapped reads is the following:

samtools view -S file.sam | cut -f1 | sort | uniq -c | perl -ane 'BEGIN{$c=0;}if($F[0]>1){$c++;}END{print "$c\n";}

(only works with single-end reads)

ADD COMMENT
0
Entering edit mode

Looking at your code you seem to be grabbing the name of the tag and counting how many lines in the SAM start with this name. Perhaps I'm mistaken, but it seems as if bwa does not give multiple lines for the same read. I don't think this would work for my data. I see bwa using the XA tag to report multiple positions.

ADD REPLY
0
Entering edit mode

Pretty sure your mistaken. BWA reports the same read multiple times. Look at the reads that have the 256 flag set. You will find the same read name before it.

ADD REPLY
1
Entering edit mode

my understanding was one read per line, use the xa2multi.pl script to split into multiple lines

ADD REPLY
0
Entering edit mode

Jeremy, you are correct by default it is one line per read. I have set BWA parameters in the past to get multiple lines. Similar to: How To Force 'Bwa Samse' To Output Multiple Hits In .Sam Format? Also Bowtie 2 -k setting reports over multiple lines. I stand corrected.

ADD REPLY
0
Entering edit mode

He/she could still use the flag column to figure out how many non-primary alignment he/she has.

ADD REPLY
1
Entering edit mode

I have to admit, that you're right. My command line script does not work for default BWA output. I don't use BWA that often and thus forgot that they us this strange output format.

ADD REPLY
0
Entering edit mode

Takes much less space to keep multiple hits on one line.

ADD REPLY
0
Entering edit mode

I think, that this format is much more complicated, if you actually want to use/visualize all multiple hits. I am also not sure, if downstream tools, like BEDtools, use this TAG, resulting in wrong expression values.

If the hit of interest is stored within the TAG.... will intersectBED, or multicov find it????

But at the end of the day, it's just important to know, that BWA stores multiple hits like that and just assure, that downstream analysis handles them, or not. ;)

ADD REPLY
0
Entering edit mode

It would be good to find somewhere where all this is written down. On the other hand, I just want to know how to properly count the duplicates. I have found tags with the "XT:A:R" flag that don't have any alternative hits listed in the read in the "XA:Z" format. Why is that I wonder? Are those particular tags repeated?

ADD REPLY

Login before adding your answer.

Traffic: 2706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6