Hi all,
I am looking for a tool that sorts gtf annotation files. Can anyone recommend one? I only came across tools that sort gff files like bedtools or gt.
I am grateful for any suggestions!
Thanks JJ
Hi all,
I am looking for a tool that sorts gtf annotation files. Can anyone recommend one? I only came across tools that sort gff files like bedtools or gt.
I am grateful for any suggestions!
Thanks JJ
I recommand to sort with the tool "gff3sort", given that with stardard unix sort, lines with the same chromosomes and start positions will be placed randomly.
gff3sort avoid this pitfall.
For example :
# Sort your gtf/gff & bgzip it
gff3sort.pl --precise --chr_order natural file.gtf/gff | bgzip > file.gtf/gff.gz;
# Create associated index
tabix -p gff file.gtf/gff.gz;
Hello erwan,
I recommand to sort with the tool "gff3sort", given that with stardard unix sort, lines with the same chromosomes and start positions will be placed randomly.
gff3sort avoid this pitfall.
could you please explain why this should be a pitfall? If there are more criteria for sorting I have to define them in some way.
fin swimmer
I think that what they mean is that a GFF file may need to be sorted by a column where the values are not ordered lexicographically or numerically. For example: mRNA needs to precede exon, and CDS may need to come after exon.
That being said a gff3sort should be a tool that creates the extra columns, translating the values to sortable ones, then a user should use sort
directly. It is unlikely that a gff3sort
written in perl would be able to compete in performance and features with a standard unix sort
.
I assumed it would accept GTF when posting (after all GTF is "GFF2.5", which is really close to GFF3), but since you asked I did a quick check :
Not knowing what your GTF look like, I took a random example : I ran the gff3sort tool on both the GTF & GFF3 of the M16 comprehensive gene annotation. There was no errors. I then loaded tracks into IGV & both displayed just fine, which is another good sign. So you should give it a try with your own GTF.
I case you want to re-run the verification :
git clone https://github.com/billzt/gff3sort.git;
cd gff3sort;
axel -q ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M16/gencode.vM16.chr_patch_hapl_scaff.annotation.gtf.gz;
axel -q ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M16/gencode.vM16.chr_patch_hapl_scaff.annotation.gff3.gz;
unpigz *.gz;
perl ./gff3sort.pl --precise --chr_order natural gencode.vM16.chr_patch_hapl_scaff.annotation.gff3 | bgzip > toto.gff3.gz;
tabix -p gff toto.gff3.gz;
perl ./gff3sort.pl --precise --chr_order natural gencode.vM16.chr_patch_hapl_scaff.annotation.gtf | bgzip > toto.gtf.gz;
tabix -p gff toto.gtf.gz;
IGV check :
awk
one-liner:
awk '$1 ~ /^#/ {print $0;next} {print $0 | "sort -k1,1 -k4,4n -k5,5n"}' in.gtf > out_sorted.gtf
gff3sort.pl seems to make sure lines having no "Parent=" attribute comes before those having it, if chrom and start position are the same. I think with unix standard program it should go like this:
$ (grep -v "Parent=" sortme.gtf;grep "Parent=" sortme.gtf)| sort -k1,1 -k4,4n -s
EDIT:
Should'nt we have to be sure that within these two groups the 5th column is sorted as well? If so, we have to expand the command a little bit:
(grep -v "Parent=" sortme.gff|sort -k1,1 -k4,4n -k5,5n;grep "Parent=" sortme.gff|sort -k1,1 -k4,4n -k5,5n)| sort -k1,1 -k4,4n -s
If more speed is required we can use gnu parallel
.
parallel ::: 'grep -v "Parent=" sortme.gff|sort -k1,1 -k4,4n -k5,5n' 'grep "Parent=" sortme.gff|sort -k1,1 -k4,4n -k5,5n' | sort -k1,1 -k4,4n -s
fin swimmer
Hi! I recently developed this tool: gtfsort, a chr/pos/feature GTF2.5-3 sorter using a using a lexicographically-based index ordering algorithm. I benchmark the results of this tool with other tools presented in this post and gtfsort outperforms all of them. Currently accepts only 2.5 and 3 GTF formats (in the future will support any given custom format).
This blog talks about it: https://zhiganglu.com/post/sort-gff-topologically/
As I explain here you can use AGAT
The script to use is agat_sp_gxf_to_gff3.pl
agat_convert_sp_gxf2gxf.pl
You will have to play with the parameter -gvo
to get back a gtf (Bioperl formated) as output.
There is also this script that came up later in AGAT that should do the job directly:
agat_convert_sp_gff2gtf.pl
It can take GFF or GTF input files
Is this still the latest tool to use within your AGAT suite, I am using Version: v0.4.0, I cannot seem to find agat_sp_gxf_to_gff3.pl
at all. At your own website, the script used is different as shown below - agat_convert_sp_gxf2gxf.pl --gff test.gff
.
Since I am using the same version of AGAT as used in your example, I suppose I could simply execute
agat_convert_sp_gxf2gxf.pl --gvi 2 --gvo 2 --gff IN.gff -o OUT.gff
. Am I right?
When asking GTF format from agat_convert_sp_gxf2gxf.pl
, it is the Bioperl converter that is used. Currently this converter is not perfect. I plan to fix the problem in Bioperl one day.
The best is to use agat_convert_sp_gff2gtf.pl
. You can find a comparison to other tools here
I suggest you install the last version of AGAT from the master branch, there is some fixes lying around. I should update AGAT to v0.4.1.
One could use the method as explained here . Just referring the same below
wget --no-check-certificate https://raw.github.com/ctokheim/PrimerSeq/master/gtf.py -O gtf.py # get command line script $ python gtf.py -c your_gtf_file.gtf # check if GTF is sorted your_gtf_file.gtf is not correctly sorted. please sort before use. $ python gtf.py -i your_gtf_file.gtf -o your_gtf_file.sorted.gtf # GTF was not sorted, so sort it
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hello,
at what criteria do you want to sort and why? The columns you can sort with the standard unix command
sort
.fin swimmer
actually I want to add some annotations to a standard annotation gtf file and then use the standard sorting to put the newly added annotations at their "proper" place.
I was thinking of Stringtie --merge as an alternative but as the annotation file and the new annotations are non-redudant I figured a simple sort should also do the trick.
I had to do the same exact thing not long ago, here is the full recipe just in case it might help ;-)