span of GFF file
1
0
Entering edit mode
9.0 years ago

I am looking for a tool that calculates span of GFF file (for example total number of bp that are composed of repeats or genes, given these are described in GFF file). I wrote awk script to do that (sums lengths of all features), but since this way I have no validation of inputs, I would prefer tool similar to bedtools to do this.

GFF span • 2.0k views
ADD COMMENT
4
Entering edit mode
9.0 years ago
Kamil ★ 2.3k

It seems to me that the tool you're looking for is bedtools genomecov.

You'll probably want to use the option "-max 1" if you're only interested to know how many bases are spanned by some type of features.

For example, you might do something like this:

bedtools genomecov -max 1 -i <(awk '$3 == "start_codon" {print}' file.gff) -g hg19.genome > start_codon.txt

The output file will tell you how many bases on each chromosome are spanned by start codons, and also how many bases in the whole genome are spanned by start codons.

ADD COMMENT

Login before adding your answer.

Traffic: 3829 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6