Remove Soft Clipped Bases
2
1
Entering edit mode
7.6 years ago
Clare ▴ 160

I want to conduct some computations using a python script directly on some BWA aligned bam files, and to do this I need to remove the soft clipped bases. i.e. if the cigar string and read is: 2S8M CCTGGAGAAT I want to clip so it becomes: 8M TGGAGAAT

I tried to do this using clip reads in GaTK but the hardclip option is throwing errors and is unsupported.

Is there anyway to remove the soft clip bases with another piece of software.

Because my coverage is not very high, I don't want to disable softclipping in BWA, as I will loose a lot of coverage.

Thanks

bam • 6.6k views
1
Entering edit mode

Can't you just parse the CIGAR string in your script and take care of it there?

3
Entering edit mode
7.4 years ago

I just wrote a program removing the clipped bases/qual

Example:

$java -jar dist/biostar84452.jar samtools-0.1.18/examples/toy.sam > out.sam @HD VN:1.4 SO:unsorted @SQ SN:ref LN:45 @SQ SN:ref2 LN:40 @PG ID:0 PN:com.github.lindenb.jvarkit.tools.biostar.Biostar84452 VN:b5ebf67dd2926d8a6afadb4d1e36a4959508057f CL:samtools-0.1.18/examples/toy.sam (...) r002 0 ref 9 0 2I6M1P1I1P1I4M2I * 0 0 AAAGATAAGGGATAAA * (...)$ grep r002 samtools-0.1.18/examples/toy.sam
r002    0    ref    9    30    1S2I6M1P1I1P1I4M2I    *    0    0    AAAAGATAAGGGATAAA    *

1
Entering edit mode
3
Entering edit mode
3.4 years ago
opplatek ▴ 110

bamutils / removeclipping works also very nice

P.S. I know the post is very old but this might be handy for somebody else reading this post in the future.

0
Entering edit mode

I'm afraid your link is broken. Are you referring to https://github.com/statgen/bamUtil? That doesn't seem to have a removeclipping tool.

0
Entering edit mode

I have edited the answer and link should be working now. The team stopped actively supporting the tools and they have migrated here. However, I don't see the clipping utility there. The 'old' one still works, though. The bamUtil is completely different package.