A Question About The Sam Entry For A Query Aligned To Multiple Places
6
4
Entering edit mode
12.1 years ago
Tianyang Li ▴ 500

Hi,

For SAM files, if a query is aligned to multiple positions would I have multiple entries for the same query or would I get multiple alignment positions in the same entry for the query?

Thanks!

format alignment sam bowtie multiple • 9.2k views
ADD COMMENT
3
Entering edit mode
12.1 years ago
Johan ▴ 890

A SAM file will contain one line for each alignment. So if a read aligned to more than one position, it would show up multiple times in the SAM file.

ADD COMMENT
1
Entering edit mode

(unless it doesn't)

ADD REPLY
0
Entering edit mode

I stand corrected. I hadn't seen multiple alignments being presented as an optional tag before. :)

ADD REPLY
3
Entering edit mode
12.1 years ago

BWA returns only one hit per read. In the SAM output, the XA field contains the alternative hits (format: (chr,pos,CIGAR,NM;)* )

ADD COMMENT
2
Entering edit mode
12.1 years ago
Vikas Bansal ★ 2.4k

I think it depends on your mapping tool and parameters. If your mapping tool select 1 random position when a read is mapped to different positions in reference genome, then you will have 1 entry (1 row for that read) in sam file. But if your mapping tool reports all different positions, then as Pierre said, you will get XA field (contains the alternative hits) in the same entry (no extra row) in sam file.

Also note that, if your read file contains 2 identical reads with different identifiers, then they will be reported in 2 different entries (rows) in sam file (this case is different from when the same read is mapped at multiple positions at reference genome).

ADD COMMENT
0
Entering edit mode

Just note that the XA field isn't a tag you can rely on universally, as it is specific to the aligner. cf. section 1.5 of the SAM spec and you'll see that any tag that starts with 'X', 'Y', or 'Z' or for "local use only" and "will not be formally defined in any future version of this specification."

ADD REPLY
1
Entering edit mode
12.1 years ago
Pascal ★ 1.5k

Ah that's funny because with my colleages we just calculated something similar. Have a look to the ratio a/b calculated on your sam file:

a: samtools view [?] | sort | uniq | wc -l b: samtools view [?] | cut -f 1 | wc -l

ADD COMMENT
1
Entering edit mode
12.1 years ago

some aligners return multiple hits on multiple lines. This is novoalign sam output using default parameters, for example:

W39CP:1373:2520 256 hsa-let-7a-1    6   1   23M *   0   0   TGAGGTAGTAGGTTGTATAGTTA >>>>>>=======9;;>===>== PG:Z:novoalign  AS:i:30 UQ:i:30 NM:i:1  MD:Z:22T0   CC:Z:hsa-let-7a-2   CP:i:5  ZS:Z:R  ZN:i:3  NH:i:3  HI:i:2  IH:i:3
W39CP:1373:2520 256 hsa-let-7a-2    5   1   23M *   0   0   TGAGGTAGTAGGTTGTATAGTTA >>>>>>=======9;;>===>== PG:Z:novoalign  AS:i:30 UQ:i:30 NM:i:1  MD:Z:22T0   ZS:Z:R  ZN:i:3  NH:i:3  HI:i:3  IH:i:3
ADD COMMENT
0
Entering edit mode

(add this to the list of stuff Picard does not like)

ADD REPLY
0
Entering edit mode

add this to the list of things Picard doesn't like

ADD REPLY
1
Entering edit mode
11.8 years ago
Xianjun ▴ 310

This is great question and great answers. I have made my learning note to the SAM format, which might be helpful to others. Here is it:

http://onetipperday.blogspot.com/2012/07/deeply-understanding-sam-tags.html

ADD COMMENT

Login before adding your answer.

Traffic: 2927 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6