How To Merge Gff Files
2
0
Entering edit mode
11.0 years ago

Hi, I was wondering if anyone know how to merge 2 or more gff files to make a consensus gff file? I have gtf file from tophat/cufflinks pipeline gff file from velvet assembly and i would like to merge these two to the already annotated gff file. The idea is to have one gff file instead of three gff files so that i can load this as track on my Genome Browser. Thanks in advance

gff merge • 15k views
ADD COMMENT
1
Entering edit mode

Any reason to not simply load three separate files? Alternatively, you can simply concatenate them and sort by chromosome location.

ADD REPLY
0
Entering edit mode

No particular reason..I want to all three files separately along with a fourth file indicating the consensus annotation which users might feel more convenient dealing with.

ADD REPLY
1
Entering edit mode

Programs such as MAKER combine evidence for gene models

ADD REPLY
6
Entering edit mode
11.0 years ago

One way to do this is to start by converting them to sorted UCSC BED with BEDOPS gtf2bed (link) and sort-bed (link). Here's a quick way to convert a bunch of files if you use a bash shell:

$ for i in `ls *.gtf`; \ 
do gtf2bed < $i > $i.converted.bed; \ 
done;

Then do a multiset union set operation with BEDOPS bedops (link) to make a single BED file called answer.bed that can be loaded into your genome browser instance:

$ bedops --everything *.converted.bed | cut -f1-6 - > answer.bed
ADD COMMENT
0
Entering edit mode

Hi Alex,

I have done genome guided assembly using StringTie and it also generated a gff file. I also have already reported CDS gff. I want to merge both gff and want to extract consensus sequences from genome.

ADD REPLY
1
Entering edit mode
4.9 years ago
O.rka ▴ 710

I know you probably figured this out since it was asked 6 years ago but I ended up using gffcompare in 2019 (for anyone else looking for this): https://ccb.jhu.edu/software/stringtie/gffcompare.shtml

It worked really well.

For installation:

conda create --name gffcompare_env -c bioconda gffcompare -y

(I usually create a separate environment for all my tools to keep things clean. )

The above tool takes in a file with paths to gtf files and then you just specify an output, Se the link for more information on the usage.

ADD COMMENT
1
Entering edit mode

Why does it remove all features except "exon" and "transcript" from gff3 output? Doesn't make any sense.

ADD REPLY
0
Entering edit mode

I have also found that gffcompare removes everything except 'exon' and 'transcript'. Has anyone found out why? Any alternative tools for doing this?

ADD REPLY
3
Entering edit mode

You can try agat_sp_merge_annotations.pl from AGAT

ADD REPLY
0
Entering edit mode

How does it deal with overlapping loci? Merging them as a single locus?

ADD REPLY

Login before adding your answer.

Traffic: 1932 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6