A Question About The Sam Entry For A Query Aligned To Multiple Places
6
4
Entering edit mode
10.4 years ago
Tianyang Li ▴ 500

Hi,

For SAM files, if a query is aligned to multiple positions would I have multiple entries for the same query or would I get multiple alignment positions in the same entry for the query?

Thanks!

format alignment sam bowtie multiple • 8.1k views
3
Entering edit mode
10.4 years ago
Johan ▴ 880

A SAM file will contain one line for each alignment. So if a read aligned to more than one position, it would show up multiple times in the SAM file.

1
Entering edit mode

(unless it doesn't)

0
Entering edit mode

I stand corrected. I hadn't seen multiple alignments being presented as an optional tag before. :)

3
Entering edit mode
10.4 years ago

BWA returns only one hit per read. In the SAM output, the XA field contains the alternative hits (format: (chr,pos,CIGAR,NM;)* )

2
Entering edit mode
10.4 years ago
Vikas Bansal ★ 2.4k

I think it depends on your mapping tool and parameters. If your mapping tool select 1 random position when a read is mapped to different positions in reference genome, then you will have 1 entry (1 row for that read) in sam file. But if your mapping tool reports all different positions, then as Pierre said, you will get XA field (contains the alternative hits) in the same entry (no extra row) in sam file.

Also note that, if your read file contains 2 identical reads with different identifiers, then they will be reported in 2 different entries (rows) in sam file (this case is different from when the same read is mapped at multiple positions at reference genome).

0
Entering edit mode

Just note that the XA field isn't a tag you can rely on universally, as it is specific to the aligner. cf. section 1.5 of the SAM spec and you'll see that any tag that starts with 'X', 'Y', or 'Z' or for "local use only" and "will not be formally defined in any future version of this specification."

1
Entering edit mode
10.4 years ago
Pascal ★ 1.5k

Ah that's funny because with my colleages we just calculated something similar. Have a look to the ratio a/b calculated on your sam file:

a: samtools view [?] | sort | uniq | wc -l b: samtools view [?] | cut -f 1 | wc -l

1
Entering edit mode
10.4 years ago

some aligners return multiple hits on multiple lines. This is novoalign sam output using default parameters, for example:

W39CP:1373:2520 256 hsa-let-7a-1    6   1   23M *   0   0   TGAGGTAGTAGGTTGTATAGTTA >>>>>>=======9;;>===>== PG:Z:novoalign  AS:i:30 UQ:i:30 NM:i:1  MD:Z:22T0   CC:Z:hsa-let-7a-2   CP:i:5  ZS:Z:R  ZN:i:3  NH:i:3  HI:i:2  IH:i:3
W39CP:1373:2520 256 hsa-let-7a-2    5   1   23M *   0   0   TGAGGTAGTAGGTTGTATAGTTA >>>>>>=======9;;>===>== PG:Z:novoalign  AS:i:30 UQ:i:30 NM:i:1  MD:Z:22T0   ZS:Z:R  ZN:i:3  NH:i:3  HI:i:3  IH:i:3

0
Entering edit mode

(add this to the list of stuff Picard does not like)

0
Entering edit mode

add this to the list of things Picard doesn't like

1
Entering edit mode
10.1 years ago
Xianjun ▴ 300

This is great question and great answers. I have made my learning note to the SAM format, which might be helpful to others. Here is it:

http://onetipperday.blogspot.com/2012/07/deeply-understanding-sam-tags.html