Question: extracting matched part(M) only for the sequence from the CIGAR string
0
gravatar for Varun Gupta
3.9 years ago by
Varun Gupta1.1k
United States
Varun Gupta1.1k wrote:

Hi Everyone, I am trying to extract the sequences for my contig from the bam file. Looking at the CIGAR string there can be different cases few of them being:

10S20M
20M10S
10S20M2S
20M1D10M
6M1D15M2D20M

and many more where we have insertions and deletions and of course N(for introns). I made my own custom genome and mapped reads to it. In the bam file, I will have the full read sequence with Soft clipping bases present. But looking them at IGV, I only see the matched part of the CIGAR string. Is there a tool which gives me only the matched part(M) of the CIGAR string and nor Soft clip sequences.

Thanks

bam • 1.5k views
ADD COMMENTlink written 3.9 years ago by Varun Gupta1.1k

Have you tried coding something in python with pysam? That'd allow doing this and outputting in whatever sort of format you need.

ADD REPLYlink written 3.9 years ago by Devon Ryan94k

Hi Devon, I looked at pysam, but did not find a tool already. May be I will have to look at the syntax and try coding in python. But I thought this was easily available

ADD REPLYlink written 3.9 years ago by Varun Gupta1.1k

But looking them at IGV, I only see the matched part of the CIGAR string. Is there a tool which gives me only the matched part(M) of the CIGAR string and nor Soft clip sequences.

I don't understand what you really need here: why discarding sequences that you don't already see in IGV ?

ADD REPLYlink written 3.9 years ago by Pierre Lindenbaum126k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1524 users visited in the last hour