Question: Samtools Indels -- Filtering Only Hits With Insertion In The Reference In The Middle Of The Sequence Hit
0
gravatar for 2184687-1231-83-
7.8 years ago by
2184687-1231-83-4.9k wrote:

Can anyone suggest how to use samtools to filter only hits where there is a insertion in the reference that splits the sequence hit roughly by the middle?

My sequences are in the range of 100-1000bp and were aligned using "bwa bwasw -z 100".

I don't expect perfect hits, so mismatches and small indels can occur at both ends of the hit, but I am looking for insertions in the reference in the middle that are 10x+ bigger than any small indels at both ends.

I don't have paired ends.

indel samtools • 1.9k views
ADD COMMENTlink modified 7.8 years ago by Istvan Albert ♦♦ 79k • written 7.8 years ago by 2184687-1231-83-4.9k
1

So you are assuming that only one read aligns around a given indel? If so, I think you might need to write some code. If, on the other hand, your depth is more than that, it might be useful to call the indels using something like Dindel rather than relying on ad hoc processing.

ADD REPLYlink written 7.8 years ago by Sean Davis25k

What kind of coverage are you talking about here? What data processing do you use to produce the alignments?

ADD REPLYlink written 7.8 years ago by Sean Davis25k

@Sean Davis: added comment - My sequences are in the range of 100-1000bp and were aligned using "bwa bwasw -z 100".

ADD REPLYlink written 7.8 years ago by 2184687-1231-83-4.9k
1
gravatar for Istvan Albert
7.8 years ago by
Istvan Albert ♦♦ 79k
University Park, USA
Istvan Albert ♦♦ 79k wrote:

I think your best bet is to parse out the CIGAR string and make a decision on that. The definition of 'roughly by the middle' is fuzzy enough to make it unlikely that such functionality would be implemented by default, a possible python code:

import re

patt  = re.compile( '\d+M|\d+D|\d+I' )
cigar = "1I12M1D12M"
vals  =  patt.findall( cigar )

print vals

# decide here how roughly the middle is defined

prints:

['1I', '12M', '1D', '12M']
ADD COMMENTlink written 7.8 years ago by Istvan Albert ♦♦ 79k

I changed the description a bit, I want to allow d+Ds at both ends, but the one in the middle had got to be 10x+ bigger than the ones at both ends.

ADD REPLYlink written 7.8 years ago by 2184687-1231-83-4.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 767 users visited in the last hour