I am trying to list the entries of a gtf file (gencode.vM25.annotation.gtf) if they have overlap. I have been looking around for some tool to perform it - is there such a tool to do what I want? Thanks
You need to start with BedTools suite. It has many features to deal with coordinates, e.g cluster, merge, etc and the documentation is really easy to follow.
This is definitely helpful - thank you
There are some tools to do interval overlap, bedtools and R/Bioconductor (rtracklayer+GenomicRanges) comes to mind. If you want to do it on the commandline use the former, or inside R use Bioconductor.
I assume you mean features in a GFF file that overlap other features in that same GFF file?
If so, have a look at AGAT , it certainly has some sub-programs that can do this.
here is more info: AGAT - Another Gff/Gtf Analysis Toolkit
Yes, exactly I want to find the overlap between features in a GFF file with other features in that same GFF file. Thank you for your input.
You could use BEDOPS gtf2bed and bedmap to map entries to themselves, filtering out any that are disjoint with awk and cut:
$ gtf2bed < annotations.gtf \
| bedmap --count --echo --echo-map - \
| awk -v FS="\t" -v OFS="\t" '($1 > 1)' \
| cut -f2- \
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy