How to find overlapping coordinates of a gtf file
4 weeks ago
Apex92 ▴ 50

I am trying to list the entries of a gtf file (gencode.vM25.annotation.gtf) if they have overlap. I have been looking around for some tool to perform it - is there such a tool to do what I want? Thanks

You need to start with BedTools suite. It has many features to deal with coordinates, e.g cluster, merge, etc and the documentation is really easy to follow.

This is definitely helpful - thank you

There are some tools to do interval overlap, bedtools and R/Bioconductor (rtracklayer+GenomicRanges) comes to mind. If you want to do it on the commandline use the former, or inside R use Bioconductor.

4 weeks ago

I assume you mean features in a GFF file that overlap other features in that same GFF file?

If so, have a look at AGAT , it certainly has some sub-programs that can do this.

Yes, exactly I want to find the overlap between features in a GFF file with other features in that same GFF file. Thank you for your input.

4 weeks ago

You could use BEDOPS gtf2bed and bedmap to map entries to themselves, filtering out any that are disjoint with awk and cut:

$gtf2bed < annotations.gtf \ | bedmap --count --echo --echo-map - \ | awk -v FS="\t" -v OFS="\t" '($1 > 1)'  \
| cut -f2- \


