Question: Is there a tool that sorts gtf files?
0
gravatar for JJ
16 months ago by
JJ440
JJ440 wrote:

Hi all,

I am looking for a tool that sorts gtf annotation files. Can anyone recommend one? I only came across tools that sort gff files like bedtools or gt.

I am grateful for any suggestions!

Thanks JJ

rna-seq genome • 3.3k views
ADD COMMENTlink modified 16 months ago by erwan.scaon720 • written 16 months ago by JJ440

Hello,

at what criteria do you want to sort and why? The columns you can sort with the standard unix command sort.

fin swimmer

ADD REPLYlink modified 16 months ago • written 16 months ago by finswimmer12k

actually I want to add some annotations to a standard annotation gtf file and then use the standard sorting to put the newly added annotations at their "proper" place.

I was thinking of Stringtie --merge as an alternative but as the annotation file and the new annotations are non-redudant I figured a simple sort should also do the trick.

ADD REPLYlink written 16 months ago by JJ440
1

I want to add some annotations to a standard annotation gtf file and then use the standard sorting to put the newly added annotations at their "proper" place

I had to do the same exact thing not long ago, here is the full recipe just in case it might help ;-)

ADD REPLYlink written 16 months ago by erwan.scaon720
3
gravatar for erwan.scaon
16 months ago by
erwan.scaon720
Nantes - France
erwan.scaon720 wrote:

I recommand to sort with the tool "gff3sort", given that with stardard unix sort, lines with the same chromosomes and start positions will be placed randomly.

gff3sort avoid this pitfall.
For example :

# Sort your gtf/gff & bgzip it
gff3sort.pl --precise --chr_order natural file.gtf/gff | bgzip > file.gtf/gff.gz;

# Create associated index
tabix -p gff file.gtf/gff.gz;
ADD COMMENTlink modified 16 months ago • written 16 months ago by erwan.scaon720
1

Hello erwan,

I recommand to sort with the tool "gff3sort", given that with stardard unix sort, lines with the same chromosomes and start positions will be placed randomly.

gff3sort avoid this pitfall.

could you please explain why this should be a pitfall? If there are more criteria for sorting I have to define them in some way.

fin swimmer

ADD REPLYlink written 16 months ago by finswimmer12k
2

I think that what they mean is that a GFF file may need to be sorted by a column where the values are not ordered lexicographically or numerically. For example: mRNA needs to precede exon, and CDS may need to come after exon.

That being said a gff3sort should be a tool that creates the extra columns, translating the values to sortable ones, then a user should use sort directly. It is unlikely that a gff3sort written in perl would be able to compete in performance and features with a standard unix sort.

ADD REPLYlink written 16 months ago by Istvan Albert ♦♦ 81k

with a bit of reasoning (and knowledge of gff format) all things you can easily achieve with linux sort ;-)

and +1 for the performance comment!

ADD REPLYlink written 16 months ago by lieven.sterck5.5k

does this tool also acept gtf? I can only see gff as input stated.

ADD REPLYlink modified 16 months ago • written 16 months ago by JJ440
1

I assumed it would accept GTF when posting (after all GTF is "GFF2.5", which is really close to GFF3), but since you asked I did a quick check :

Not knowing what your GTF look like, I took a random example : I ran the gff3sort tool on both the GTF & GFF3 of the M16 comprehensive gene annotation. There was no errors. I then loaded tracks into IGV & both displayed just fine, which is another good sign. So you should give it a try with your own GTF.

I case you want to re-run the verification :

git clone https://github.com/billzt/gff3sort.git;
cd gff3sort;
axel -q ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M16/gencode.vM16.chr_patch_hapl_scaff.annotation.gtf.gz;
axel -q ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M16/gencode.vM16.chr_patch_hapl_scaff.annotation.gff3.gz;
unpigz *.gz;
perl ./gff3sort.pl --precise --chr_order natural gencode.vM16.chr_patch_hapl_scaff.annotation.gff3 | bgzip > toto.gff3.gz;
tabix -p gff toto.gff3.gz;
perl ./gff3sort.pl --precise --chr_order natural gencode.vM16.chr_patch_hapl_scaff.annotation.gtf | bgzip > toto.gtf.gz;
tabix -p gff toto.gtf.gz;

IGV check :

igv

ADD REPLYlink modified 16 months ago • written 16 months ago by erwan.scaon720

I think the main difference between gff and gtf is the parent tag - I have it in the gff but not in the gtf. I downloaded the main annotation files in both formats from gencode. hence, gff3sort.pl would not work on the gtf properly either...

ADD REPLYlink written 16 months ago by JJ440

still have to encounter to first useful example of using this gff3sort over normal linux sort

Find it as well surprising that this gets published while others are struggling to get real interesting biological stuff published ....

ADD REPLYlink written 16 months ago by lieven.sterck5.5k
4
gravatar for finswimmer
16 months ago by
finswimmer12k
Germany
finswimmer12k wrote:

gff3sort.pl seems to make sure lines having no "Parent=" attribute comes before those having it, if chrom and start position are the same. I think with unix standard program it should go like this:

$ (grep -v "Parent=" sortme.gtf;grep "Parent=" sortme.gtf)| sort -k1,1 -k4,4n -s

EDIT:

Should'nt we have to be sure that within these two groups the 5th column is sorted as well? If so, we have to expand the command a little bit:

(grep -v "Parent=" sortme.gff|sort -k1,1 -k4,4n -k5,5n;grep "Parent=" sortme.gff|sort -k1,1 -k4,4n -k5,5n)| sort -k1,1 -k4,4n -s

If more speed is required we can use gnu parallel.

parallel ::: 'grep -v "Parent=" sortme.gff|sort -k1,1 -k4,4n -k5,5n' 'grep "Parent=" sortme.gff|sort -k1,1 -k4,4n -k5,5n' | sort -k1,1 -k4,4n -s

fin swimmer

ADD COMMENTlink modified 16 months ago • written 16 months ago by finswimmer12k
1

Thanks for this. However, I don't have the "Parent=" tag in the gtf file - I downloaded the main annotation file from gencode. Hence, your solution does not work for me ... any other suggestions? thanks!

ADD REPLYlink written 16 months ago by JJ440
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1700 users visited in the last hour