Make a bed file to exclude 1Mb from start and end of contigs
0
0
Entering edit mode
4.1 years ago
Rubal ▴ 350

I have lists of contig coordinates for several assemblies and would like to create a bed format mask to exclude variants in the first and last 1Mb of each contig.

Example lines from the contig files:

Chr1 1 123000000
Chr2 1 11435255
AEG1.2 1 2335

I could do something simple using awk like this

awk '{print ($1,$2+1000000,$3 - 1000000)}' contig.bed > filter_ends.bed

This would be a positive mask of regions to keep and I'd prefer a negative mask (though that's not essential). But it would not behave properly for contigs that are < 2000000bp, it would return non existent or negative coordinates.

Effectively I will be excluding those contigs anyway because the filtering from both ends will overlap. I could do this in two steps but as I have many assemblies to run over does anyone know a good approach for this? I suppose for example first one could remove the contigs < 2000000 and then run the awk command.

Thanks in advance for your suggestions.

genome filter bed mask • 834 views
ADD COMMENT

Login before adding your answer.

Traffic: 3000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6