filtering SAM/BAM to remove hits spanning short combined alignment lengths and low counts
Entering edit mode
12 weeks ago
Bryan ▴ 10

hi folks,

apologies if this has been answered elsewhere. I'm using read mapping to quantitate the abundance of viral metagenome assembled genomes (MAGs) across samples and I'd like to do a bit of data cleaning that's proving to be a little trickier than anticipated. I'm mapping reads from several samples against a set of MAGs that I've cross assembled from those samples. I use BBmap for this and then convert the SAM output into sorted BAM files that I feed to CoverM to convert into a counts per million (CPM) sample matrix. CoverM is great but only enables abundance filtering based on the fraction of a contig that is covered by reads in that sample (eg. 10%). This gets tricky when you have contigs that span an order of magnitude or more in length and so I'd like to do some abundance filtering based on absolute read counts and total alignment length as well.

Is there a way via samtools or otherwise to filter SAM/BAM files to remove (or set the counts to zero) alignments that are shorter than combined length x and include less than y reads? The intended effect of this would be to prevent single pair alignments and alignments that include several reads but only span 300bps or so from being included in abundance calculations.

thanks! bryan

align bbmap metagenomics • 255 views
Entering edit mode
12 weeks ago
Tao ▴ 30

Use the python package pysam pysam document


Login before adding your answer.

Traffic: 1244 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6