SAM file: same reads ID, slightly different sequence
1
1
Entering edit mode
7.4 years ago

After aligning with STAR data coming from RNAseq to a bacterial genome, I notice something (for me) unexpected in the SAM output file. As it can be seen from the lines below, some reads having the same ID (first column) have alightly different sequences (10th column) . Why so ?

HWI-ST1299:109:C3ELBACXX:7:1104:21137:23785 16  gi|15674250|ref|NC_002737.1|    1578076 0   45M *   0   0   CCCCAACTACATCAGGCGTTCTAGGGCTTAACTGCTGTGTTCGGC   IEGIGHEGFGGBJJJJJJJIIJJJIHGHEHIHCJJJJIHFHHHFD   NH:i:6  HI:i:1  AS:i:44 nM:i:0
HWI-ST1299:109:C3ELBACXX:7:1104:21137:23785 256 gi|15674250|ref|NC_002737.1|    269351  0   45M *   0   0   GCCGAACACAGCAGTTAAGCCCTAGAACGCCTGATGTAGTTGGGG   DFHHHFHIJJJJCHIHEHGHIJJJIIJJJJJJJBGGFGEHGIGEI   NH:i:6  HI:i:2  AS:i:44 nM:i:0
HWI-ST1299:109:C3ELBACXX:7:1104:21137:23785 272 gi|15674250|ref|NC_002737.1|    1331147 0   45M *   0   0   CCCCAACTACATCAGGCGTTCTAGGGCTTAACTGCTGTGTTCGGC   IEGIGHEGFGGBJJJJJJJIIJJJIHGHEHIHCJJJJIHFHHHFD   NH:i:6  HI:i:3  AS:i:44 nM:i:0
HWI-ST1299:109:C3ELBACXX:7:1104:21137:23785 256 gi|15674250|ref|NC_002737.1|    22063   0   45M *   0   0   GCCGAACACAGCAGTTAAGCCCTAGAACGCCTGATGTAGTTGGGG   DFHHHFHIJJJJCHIHEHGHIJJJIIJJJJJJJBGGFGEHGIGEI   NH:i:6  HI:i:4  AS:i:44 nM:i:0
HWI-ST1299:109:C3ELBACXX:7:1104:21137:23785 256 gi|15674250|ref|NC_002737.1|    28064   0   45M *   0   0   GCCGAACACAGCAGTTAAGCCCTAGAACGCCTGATGTAGTTGGGG   DFHHHFHIJJJJCHIHEHGHIJJJIIJJJJJJJBGGFGEHGIGEI   NH:i:6  HI:i:5  AS:i:44 nM:i:0
HWI-ST1299:109:C3ELBACXX:7:1104:21137:23785 256 gi|15674250|ref|NC_002737.1|    84283   0   45M *   0   0   GCCGAACACAGCAGTTAAGCCCTAGAACGCCTGATGTAGTTGGGG   DFHHHFHIJJJJCHIHEHGHIJJJIIJJJJJJJBGGFGEHGIGEI   NH:i:6  HI:i:6  AS:i:44 nM:i:0
RNA-Seq alignment SAM format • 2.5k views
ADD COMMENT
0
Entering edit mode

Thanks for your answer Pierre. My problem is that, if I am not wrong, the 10th column should report the sequence of the read and, thus, I'm expecting that given the same ID in the first column I should have the same read in the 10th column. This case is confusing me, because, apparently, all the reads, even though multi-mapped (as indicated by the SAM flags) look like to map equally well (MAPQ =0, AS=44 for all of them, nM=0 for all of them). In such a case, according to the STAR developers, the primary alignment flag is assigned randomly. Thus, I'm wondering how could it be that reads with (slightly) different sequences:

1) have the same ID 2) map equally well

ADD REPLY
0
Entering edit mode

I should have the same read in the 10th column.

wrong: the sequence has been reverse-complemented because the flag says it's on the reverse strand.

map equally well

yes, and a random read is chosen as the 'primary' alignment.

ADD REPLY
0
Entering edit mode

Thanks you indeed for your prompt response. Probably I am misunderstanding something: what does the 10th column exactly report ? The sequence of the "sequenced read" or the sequence to which the read has been aligned ?

ADD REPLY
3
Entering edit mode

It's the bases in the sequenced read, presented as they would appear on the reference's positive strand. Thus if two reads have been mapped to the same position, they will look the same (modulo sequencing errors etc) in the SAM file, regardless of which strands actually went into the sequencing machine.

[Equivalently to Pierre's reply, but differently phrased.]

ADD REPLY
0
Entering edit mode

The sequence of the "sequenced read" or the sequence to which the read has been aligned ?

its 'the sequence of the read, rev-compl is the read was mapped on the reverse strand, truncated if the read is hard-clipped (see CIGAR string).

ADD REPLY
0
Entering edit mode

Thank you indeed for the clarifications !

ADD REPLY
2
Entering edit mode
7.4 years ago

see http://broadinstitute.github.io/picard/explain-flags.html

reads with flag = 256 are secondary alignments, 272= secondary alignments+ reverse strand.

ADD COMMENT

Login before adding your answer.

Traffic: 1801 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6