Getting the overlap between two GTF files
4
0
Entering edit mode
7 months ago
feather-W • 0

Hello,

I have two GTF files which contain the information of transcripts, and I want to get the overlap of transcripts between the two GTF file. Can anyone give me some advice?

Thanks!

RNA-seq GTF • 1.2k views
ADD COMMENT
0
Entering edit mode

Ok, thanks for all of your help very much! I will try it.

ADD REPLY
1
Entering edit mode
6 months ago
rfran010 ▴ 900

Maybe somebody knows something I don't, but I feel like bedtools should be able to handle your gtf files directly.

If it's not done already, you can filter for transcripts only of each file, then use bedtools:

awk '$3 == "transcript"' file1.gtf > file1.txOnly.gtf
awk '$3 == "transcript"' file2.gtf > file2.txOnly.gtf

bedtools intersect -u -a file1.txOnly.gtf -b file2.txOnly.gtf > file1_tx_overlapping_file2_tx.gtf

depending on your exact goals for what overlaps you want reported you can switch the -a and -b files or adjust options.

ADD COMMENT
0
Entering edit mode
7 months ago
bedtools intersect \
    -a  <(awk '/^[^#]/ {printf("%s\t%d\t%s\t%s\n",$1,int($4)-1,$5,$0);}' file1.gtf  | sort -t $'\t' -k1,1 -k2,2n ) \
    -b  <(awk '/^[^#]/ {printf("%s\t%d\t%s\t%s\n",$1,int($4)-1,$5,$0);}' file2.gtf  | sort -t $'\t' -k1,1 -k2,2n )
ADD COMMENT
0
Entering edit mode
7 months ago
GenoMax 141k

Something in AGAT should work: https://agat.readthedocs.io/en/latest/?badge=latest

ADD COMMENT
0
Entering edit mode

What would that be?

ADD REPLY
0
Entering edit mode

The script agat_sp_compare_two_annotations.pl or agat_sp_sensitivity_specificity.pl
But if you are interested in the ranges then bedtools bedops or awk will be your friend

ADD REPLY
0
Entering edit mode

It would be helpful if recommendations for toolkits included actual solutions.

ADD REPLY
0
Entering edit mode

Since I had not recommended a specific script in AGAT I had posted this as a comment so OP can check the documentation on their own. If Juke34 (author of AGAT) wants to post a stand-alone answer we can delete my original comment.

ADD REPLY
0
Entering edit mode
7 months ago

Using bedops --intersect and gtf2bed will get their common genomic space:

bedops --intersect <(gtf2bed < transcripts.gtf) <(gtf2bed < transcripts.gtf) > answer.bed

If you want to know what transcripts overlap other transcripts, specifically, you could use bedmap --echo --echo-map:

bedmap --echo --echo-map <(gtf2bed < transcripts.gtf) > answer.bed

More information at: https://bedops.readthedocs.io/

ADD COMMENT

Login before adding your answer.

Traffic: 1650 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6