Filter bam files based on length of contig
2
0
Entering edit mode
3.9 years ago

Hi all,

I would like to get the number of reads mapped to contigs that are greater than 1000 bp in my assembly. I ran samtools flagstat to get some of the metrics. However, is there an efficient way to filter the bam file, so that the stats I get are from contigs larger than 1000 bp (1kb) only?

Thank you for your help!

Assembly map bam sam metagenomics • 1.4k views
ADD COMMENT
0
Entering edit mode
3.9 years ago
Gabriel R. ★ 2.9k
samtools view -b input.sorted.bam `samtools view -H input.sorted.bam |grep "^@SQ" |cut -f 2,3  |sed "s/.N://g" |awk '{if($2>1000){print $1}}' |tr "\n" " "` > input.longcontig.sorted.bam
ADD COMMENT
0
Entering edit mode
3.9 years ago
ATpoint 82k

Check the output of samtools idxstats. It gives you in a tab-delimited format the name of the chromosome, chromosome length and the number of reads mapping to it. It requires an indexed BAM file. The output is easily parsable and it is fast since it relies on the index.

ADD COMMENT

Login before adding your answer.

Traffic: 1787 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6