Entering edit mode
8.1 years ago
Varun Gupta
★
1.3k
Hi Everyone, I am trying to extract the sequences for my contig from the bam file. Looking at the CIGAR string there can be different cases few of them being:
10S20M
20M10S
10S20M2S
20M1D10M
6M1D15M2D20M
and many more where we have insertions and deletions and of course N(for introns). I made my own custom genome and mapped reads to it. In the bam file, I will have the full read sequence with Soft clipping bases present. But looking them at IGV, I only see the matched part of the CIGAR string. Is there a tool which gives me only the matched part(M) of the CIGAR string and nor Soft clip sequences.
Thanks
Have you tried coding something in python with pysam? That'd allow doing this and outputting in whatever sort of format you need.
Hi Devon, I looked at pysam, but did not find a tool already. May be I will have to look at the syntax and try coding in python. But I thought this was easily available
I don't understand what you really need here: why discarding sequences that you don't already see in IGV ?