extracting matched part(M) only for the sequence from the CIGAR string
0
0
Entering edit mode
8.1 years ago
Varun Gupta ★ 1.3k

Hi Everyone, I am trying to extract the sequences for my contig from the bam file. Looking at the CIGAR string there can be different cases few of them being:

10S20M
20M10S
10S20M2S
20M1D10M
6M1D15M2D20M

and many more where we have insertions and deletions and of course N(for introns). I made my own custom genome and mapped reads to it. In the bam file, I will have the full read sequence with Soft clipping bases present. But looking them at IGV, I only see the matched part of the CIGAR string. Is there a tool which gives me only the matched part(M) of the CIGAR string and nor Soft clip sequences.

Thanks

bam • 2.7k views
ADD COMMENT
0
Entering edit mode

Have you tried coding something in python with pysam? That'd allow doing this and outputting in whatever sort of format you need.

ADD REPLY
0
Entering edit mode

Hi Devon, I looked at pysam, but did not find a tool already. May be I will have to look at the syntax and try coding in python. But I thought this was easily available

ADD REPLY
0
Entering edit mode

But looking them at IGV, I only see the matched part of the CIGAR string. Is there a tool which gives me only the matched part(M) of the CIGAR string and nor Soft clip sequences.

I don't understand what you really need here: why discarding sequences that you don't already see in IGV ?

ADD REPLY

Login before adding your answer.

Traffic: 2133 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6