How to find overlapping coordinates of a gtf file
2
0
Entering edit mode
4 weeks ago
Apex92 ▴ 50

I am trying to list the entries of a gtf file (gencode.vM25.annotation.gtf) if they have overlap. I have been looking around for some tool to perform it - is there such a tool to do what I want? Thanks

gtf-file sequencing genome annotation RNA-Seq • 176 views
1
Entering edit mode

You need to start with BedTools suite. It has many features to deal with coordinates, e.g cluster, merge, etc and the documentation is really easy to follow.

0
Entering edit mode

This is definitely helpful - thank you

1
Entering edit mode

There are some tools to do interval overlap, bedtools and R/Bioconductor (rtracklayer+GenomicRanges) comes to mind. If you want to do it on the commandline use the former, or inside R use Bioconductor.

0
Entering edit mode
4 weeks ago

I assume you mean features in a GFF file that overlap other features in that same GFF file?

If so, have a look at AGAT , it certainly has some sub-programs that can do this.

0
Entering edit mode

Yes, exactly I want to find the overlap between features in a GFF file with other features in that same GFF file. Thank you for your input.

0
Entering edit mode
4 weeks ago

You could use BEDOPS gtf2bed and bedmap to map entries to themselves, filtering out any that are disjoint with awk and cut:

$gtf2bed < annotations.gtf \ | bedmap --count --echo --echo-map - \ | awk -v FS="\t" -v OFS="\t" '($1 > 1)'  \
| cut -f2- \


References: