Question: How To Merge Gff Files
0
gravatar for upendrakumar.devisetty
7.3 years ago by
United States
upendrakumar.devisetty370 wrote:

Hi, I was wondering if anyone know how to merge 2 or more gff files to make a consensus gff file? I have gtf file from tophat/cufflinks pipeline gff file from velvet assembly and i would like to merge these two to the already annotated gff file. The idea is to have one gff file instead of three gff files so that i can load this as track on my Genome Browser. Thanks in advance

gff merge • 7.6k views
ADD COMMENTlink modified 14 months ago by O.rka200 • written 7.3 years ago by upendrakumar.devisetty370
1

Any reason to not simply load three separate files? Alternatively, you can simply concatenate them and sort by chromosome location.

ADD REPLYlink written 7.3 years ago by Sean Davis26k

No particular reason..I want to all three files separately along with a fourth file indicating the consensus annotation which users might feel more convenient dealing with.

ADD REPLYlink written 7.3 years ago by upendrakumar.devisetty370

Programs such as MAKER combine evidence for gene models

ADD REPLYlink written 14 months ago by cmdcolin1.4k
5
gravatar for Alex Reynolds
7.3 years ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

One way to do this is to start by converting them to sorted UCSC BED with BEDOPS gtf2bed (link) and sort-bed (link). Here's a quick way to convert a bunch of files if you use a bash shell:

$ for i in `ls *.gtf`; \ 
do gtf2bed < $i > $i.converted.bed; \ 
done;

Then do a multiset union set operation with BEDOPS bedops (link) to make a single BED file called answer.bed that can be loaded into your genome browser instance:

$ bedops --everything *.converted.bed | cut -f1-6 - > answer.bed
ADD COMMENTlink modified 14 months ago by RamRS28k • written 7.3 years ago by Alex Reynolds30k

Hi Alex,

I have done genome guided assembly using StringTie and it also generated a gff file. I also have already reported CDS gff. I want to merge both gff and want to extract consensus sequences from genome.

ADD REPLYlink written 3.3 years ago by Bioinfonext220
1
gravatar for O.rka
14 months ago by
O.rka200
O.rka200 wrote:

I know you probably figured this out since it was asked 6 years ago but I ended up using gffcompare in 2019 (for anyone else looking for this): https://ccb.jhu.edu/software/stringtie/gffcompare.shtml

It worked really well.

For installation:

conda create --name gffcompare_env -c bioconda gffcompare -y

(I usually create a separate environment for all my tools to keep things clean. )

The above tool takes in a file with paths to gtf files and then you just specify an output, Se the link for more information on the usage.

ADD COMMENTlink modified 14 months ago by WouterDeCoster44k • written 14 months ago by O.rka200
1

Why does it remove all features except "exon" and "transcript" from gff3 output? Doesn't make any sense.

ADD REPLYlink written 6 months ago by P1622610

I have also found that gffcompare removes everything except 'exon' and 'transcript'. Has anyone found out why? Any alternative tools for doing this?

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by epaminonda10
1

You can try agat_sp_merge_annotations.pl from AGAT

ADD REPLYlink written 10 weeks ago by Juke344.5k

How does it deal with overlapping loci? Merging them as a single locus?

ADD REPLYlink written 14 months ago by Juke344.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1522 users visited in the last hour