Question: Is there a more optimal SAM orientation parsing method?
1
3.7 years ago by
Daniel3.7k
Cardiff University
Daniel3.7k wrote:

I have been working on a script for identifying frequent 'end points' of paired end reads from a SAM file on a genome for determining uneven cut frequencies, which I need to integrate the orientation and 1st/2nd read information into. The 2nd column of a SAM file gives the detail on the orientation with a number (as described here and here).

I hadn't used a bioperl import method because all I was doing was pulling the genomic start position of the read and building a hash by frequency, but I need to now tune it for directionality and mate pair.

My question is whether there is a better way to do this than:

```## Numbers are random examples, not correctly done at the moment.
if (\$line[1] == 99){
\$coordinate = \$start
}elsif(\$line[1] == 147){
\$coordinate = \$start - \$readlength
}elsif(\$line[1] == 83){
\$coordinate = \$start + \$readlength
}.........etc```

Given that a picture paints a thousand words, basically what I want is to pull out the coordinates of the red circles from my SAM file. It's part of a bigger perl script, so I'd prefer to avoid other tools, but if they exist I should look at them.

sam perl • 1.3k views
modified 3.7 years ago by Damian Kao15k • written 3.7 years ago by Daniel3.7k

what is -/+36 the read length ? it 's not as simple for the negative-strand read because you'll have to walk to the cigar string (indel and insertion)...

Sorry, those numbers would vary based upon the read length. I jut know that in my data here, the reads are all trimmed to 36bp.

From my bowtie mapping, the forward and reverse reads are independent, not pair-matched, so I think they can be independently looked at without needing to worry about indels, right? I considered mapping them together but as I'm interested in the end points, it's not essential that everything has a pair and I wanted to avoid throwing any more out.

"From my bowtie mapping, the forward and reverse reads are independent, not pair-matched, so I think they can be independently looked at without needing to worry about indels, right? " : no if there is an indel in the read (cigar string contains 'I' or 'D' ) or even clipping you can't just add the values.

Ah, ok I see. I was only thinking of indels between the paired reads. I can include that in the calculation, but I'm starting to think that the bioperl SAM module will be better suited for this.

1
3.7 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

You can parse the bitwise flags with the AND operator in perl. For example, the 5th bit in the bitwise flag denotes orientation:

```if (\$flag & 16) {
return 'negative strand';
} else {
return 'positive strand';
} ```

The 5th bit is 2^4 = 16. 1st bit is 2^0, 2nd bit is 2^1...