Question: Is there a more optimal SAM orientation parsing method?
1
gravatar for Daniel
2.4 years ago by
Daniel3.5k
Cardiff University
Daniel3.5k wrote:

I have been working on a script for identifying frequent 'end points' of paired end reads from a SAM file on a genome for determining uneven cut frequencies, which I need to integrate the orientation and 1st/2nd read information into. The 2nd column of a SAM file gives the detail on the orientation with a number (as described here and here).

I hadn't used a bioperl import method because all I was doing was pulling the genomic start position of the read and building a hash by frequency, but I need to now tune it for directionality and mate pair.

My question is whether there is a better way to do this than: 

## Numbers are random examples, not correctly done at the moment.
if ($line[1] == 99){
    $coordinate = $start
}elsif($line[1] == 147){
    $coordinate = $start - $readlength
}elsif($line[1] == 83){
    $coordinate = $start + $readlength
}.........etc

 

Given that a picture paints a thousand words, basically what I want is to pull out the coordinates of the red circles from my SAM file. It's part of a bigger perl script, so I'd prefer to avoid other tools, but if they exist I should look at them.

sam perl • 795 views
ADD COMMENTlink modified 2.4 years ago by Damian Kao14k • written 2.4 years ago by Daniel3.5k

what is -/+36 the read length ? it 's not as simple for the negative-strand read because you'll have to walk to the cigar string (indel and insertion)...

ADD REPLYlink written 2.4 years ago by Pierre Lindenbaum98k

Sorry, those numbers would vary based upon the read length. I jut know that in my data here, the reads are all trimmed to 36bp.

From my bowtie mapping, the forward and reverse reads are independent, not pair-matched, so I think they can be independently looked at without needing to worry about indels, right? I considered mapping them together but as I'm interested in the end points, it's not essential that everything has a pair and I wanted to avoid throwing any more out.

ADD REPLYlink written 2.4 years ago by Daniel3.5k

"From my bowtie mapping, the forward and reverse reads are independent, not pair-matched, so I think they can be independently looked at without needing to worry about indels, right? " : no if there is an indel in the read (cigar string contains 'I' or 'D' ) or even clipping you can't just add the values.

ADD REPLYlink written 2.4 years ago by Pierre Lindenbaum98k

Ah, ok I see. I was only thinking of indels between the paired reads. I can include that in the calculation, but I'm starting to think that the bioperl SAM module will be better suited for this.

ADD REPLYlink written 2.4 years ago by Daniel3.5k
1
gravatar for Damian Kao
2.4 years ago by
Damian Kao14k
UK
Damian Kao14k wrote:

You can parse the bitwise flags with the AND operator in perl. For example, the 5th bit in the bitwise flag denotes orientation:

if ($flag & 16) { 
   return 'negative strand';
} else {
   return 'positive strand';
} 

The 5th bit is 2^4 = 16. 1st bit is 2^0, 2nd bit is 2^1...

ADD COMMENTlink written 2.4 years ago by Damian Kao14k

I've just had a crash-course in bitwise flags, I see how this works now, thanks.

ADD REPLYlink written 2.4 years ago by Daniel3.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1477 users visited in the last hour