Counting gaps that are less than a cutoff in a BAM file
2
0
Entering edit mode
21 months ago
Manuel ▴ 40

From a BAM file I have counted the depth of the bases with samtools depth. So now I have a file like this

 1       127282  1
1       127283  1
1       127284  1
1       127285  1
1       127286  1
1       127287  1
1       127288  1
1       127289  1
1       127290  1
1       127291  1
1       127292  1
1       127293  1
1       127294  1
1       127295  1
1       127296  1
1       127297  1
1       127298  1
1       127299  1
1       127300  1

I want to get the number of gaps in my depth that are less than 20. In other words, I want to count the number of sets of consecutive lines where all of the numbers in the 3rd column are less than 20.

Any idea how to do this?

BAM • 660 views
ADD COMMENT
2
Entering edit mode
21 months ago

I want to count the number of sets of consecutive lines where all of the numbers in the 3rd column are less than 20.

samtools depth -a in.bam | awk '{printf("%s\n",($3<20?"+":"-"));}' | uniq | sort | uniq -c | grep -w -F '-'
    105 -

(with a little arctefact when the chromosome name is changing)

ADD COMMENT
0
Entering edit mode
21 months ago
Manuel ▴ 40

Same result using this:

awk '$3<20&&(p>=20||p==""){c++}{p=$3}END{print c}' file
ADD COMMENT

Login before adding your answer.

Traffic: 1515 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6