GFF3 to GTF problems
1
0
Entering edit mode
3.7 years ago
rank ▴ 10

Dear All,

my ultimate goal is to use the genewise option in popoolation to evaluate allele frequency variation at mitochondrial snps between two divergent populations. I need to convert the gff file to a gtf file first. So I have buried myself in different ways of annotating the GFF3 file to get popoolation to read it, and I get nowhere. So I have tried several gff converters. The original GFF3 file comes from MITOS or sometimes I import that into Geneious and export it again. I have worked for days on different annotation types, with lines for parents included or omitted. Nothing seems to work. I have been using the GenomeTools website to validate the file and I do find some errors. It looks like the command line version of GenomeTools would work a lot better, but its hard for me to figure out how to properly install it in macOS. Here's the output of that command (I used the biostars installation to get conda installed a few weeks ago and have been able to install other programs that way. But I just get errors when I try to install GenomeTools.

My ultimate goal is to get a gff that will convert to a gtf that will be readable by popoolation. So if there is another route to that goal without needing GenomeTools I would be happy to take it.

$ conda install genometools
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: \ 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                                                                                                                                                                          

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  - genometools -> python=2.7

Your python: python=3.7

If python is on the left-most side of the chain, that's the version you've asked for.
When python appears to the right, that indicates that the thing on the left is somehow
not available for the python version you are constrained to. Note that conda will not
change your python version to a different minor version unless you explicitly specify
that.



(base)
popoolation gffread agat poolseq mitos • 2.1k views
ADD COMMENT
1
Entering edit mode

Use AGAT toolkit to do the conversion as shown here: A: converting .gff file to .gtf

ADD REPLY
1
Entering edit mode

When installing bioinformatics tools with conda, do not install them at the base environment, create an environment for the tool - environments will prevent this sort of conflict between incompatible program versions.

conda create -n genometools genometools
conda activate genometools

The first command will create an environment called genometools, and install genometools in this environment. The second command will activate this environment for use.

ADD REPLY
0
Entering edit mode

This is probably the easiest solution because it will also install python-2.7+ for you.

ADD REPLY
0
Entering edit mode

Thank you I got it installed that way

ADD REPLY
0
Entering edit mode

Try gffread from conda

ADD REPLY
1
Entering edit mode

In the end, I worked with AGAT, gffread, and Kent's tools to get the gtf. I had to simplify the gff to get one of them to work properly, and once I had a version that would run, I was able to add complexity back to the gff and then convert to gtf.

ADD REPLY
0
Entering edit mode

Feel free to post the steps and commands you used as an answer, it may help people in a similar situation as yours.

ADD REPLY
0
Entering edit mode
3.7 years ago
rank ▴ 10
  1. I downloaded a gff file from a close relative to my study organism uploaded by a friend to Genbank.
  2. I manually edited the MITOS output gff in BBEdit to match the format in the Genbank gff. Then I deleted the duplicate records for the RNA molecules, while keeping the record 'gene' and 'CDS' for the protein coding sequence.
  3. I repeatedly checked my progress with the website GFF validator (http://genometools.org/cgi-bin/gff3validator.cgi) and kept editing and simplifying until my GFF passed the test with the green text 'Validation successful.'
  4. At that point, I could run Kent's tools (UCSC) to get a GTF I could import into popoolation.

gff3ToGenePred ESRC_83_Aug2020.gff ESRC_83_Aug2020.genePred

$ genepredtogtf file ESRC_83_Aug2020.genePred ESRC_83_Aug2020.gtf

Here's part of the GFF that worked:

##gff-version 3
##sequence-region ESRC_83 1 18008
ESRC_83 Geneious    region  1   18008   .   +   0   ID=ESRC_83;Is_circular=true
ESRC_83 mitfi   rRNA    13096   14374   .   -   .   ID=rna-16SrRNA;product=16SrRNA;gene_biotype=rRNA
...
ESRC_83 mitfi   tRNA    4284    4352    .   +   .   ID=rna-tRNA_Lys;product=tRNA_Lys;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    4353    4415    .   +   .   ID=rna-tRNA_Asp;product=tRNA_Asp;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    592 663 .   +   .   ID=rna-tRNA_Ile;product=tRNA_Ile;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    6023    6086    .   +   .   ID=rna-tRNA_Gly;product=tRNA_Gly;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    6444    6509    .   +   .   ID=rna-tRNA_Ala;product=tRNA_Ala;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    6509    6573    .   +   .   ID=rna-tRNA_Arg;product=tRNA_Arg;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    6573    6637    .   +   .   ID=rna-tRNA_Asn;product=tRNA_Asn;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    6638    6704    .   +   .   ID=rna-tRNA_Ser1;product=tRNA_Ser1;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    665 733 .   -   .   ID=rna-tRNA_Gln;product=tRNA_Gln;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    6705    6767    .   +   .   ID=rna-tRNA_GLu;product=tRNA_Glu;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    6766    6828    .   -   .   ID=rna-tRNA_Phe;product=tRNA_Phe;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    733 801 .   +   .   ID=rna-tRNA_Met;product=tRNA_Met;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    8546    8607    .   -   .   ID=rna-tRNA_H;product=tRNA_His;gene_biotype=tRNA
ESRC_83 mitos   gene    10361   10861   .   +   .   ID=gene-nad6;Name=nad6;gene=nad6;gene_biotype=protein_coding
ESRC_83 mitos   gene    10861   11998   .   +   .   ID=gene-cytb;Name=cytb;gene=cytb;gene_biotype=protein_coding
ESRC_83 mitos   gene    12079   13026   .   -   .   ID=gene-nad1;Name=nad1;gene=nad1;gene_biotype=protein_coding
ESRC_83 mitos   gene    1988    3535    .   +   .   ID=gene-cox1;Name=cox1;gene=cox1;gene_biotype=protein_coding
ESRC_83 mitos   gene    3617    4303    .   +   .   ID=gene-cox2;Name=cox2;gene=cox2;gene_biotype=protein_coding
ESRC_83 mitos   gene    4416    4571    .   +   .   ID=gene-atp8;Name=atp8;gene=atp8;gene_biotype=protein_coding
ESRC_83 mitos   gene    4565    5237    .   +   .   ID=gene-atp6;Name=atp6;gene=atp6;gene_biotype=protein_coding
ESRC_83 mitos   gene    5236    6075    .   +   .   ID=gene-cox3;Name=cox3;gene=cox3;gene_biotype=protein_coding
ESRC_83 mitos   gene    6093    6440    .   +   .   ID=gene-nad3;Name=nad3;gene=nad3;gene_biotype=protein_coding
ESRC_83 mitos   gene    6803    8527    .   -   .   ID=gene-nad5;Name=nad5;gene=nad5;gene_biotype=protein_coding
ESRC_83 mitos   gene    802 1812    .   +   .   ID=gene-nad2;Name=nad2;gene=nad2;gene_biotype=protein_coding
ESRC_83 mitos   gene    8561    9937    .   -   .   ID=gene-nad4;Name=nad4;gene=nad4;gene_biotype=protein_coding
ESRC_83 mitos   gene    9931    10218   .   -   .   ID=gene-nad4l;Name=nad4l;gene=nad4l;gene_biotype=protein_coding
ESRC_83 mitos   CDS 10361   10861   .   +   0   ID=cds-nad6;Parent=gene-nad6;transl_table=5
ESRC_83 mitos   CDS 10861   11998   .   +   0   ID=cds-cytb;Parent=gene-cytb;transl_table=5
ESRC_83 mitos   CDS 12079   13026   .   -   0   ID=cds-nad1;Parent=gene-nad1;transl_table=5
ESRC_83 mitos   CDS 1988    3535    .   +   0   ID=cds-cox1;Parent=gene-cox1;transl_table=5
ESRC_83 mitos   CDS 3617    4303    .   +   0   ID=cds-cox2;Parent=gene-cox2;transl_table=5
ESRC_83 mitos   CDS 4416    4571    .   +   0   ID=cds-atp8;Parent=gene-atp8;transl_table=5
ESRC_83 mitos   CDS 4565    5237    .   +   0   ID=cds-atp6;Parent=gene-atp6;transl_table=5
ESRC_83 mitos   CDS 5236    6075    .   +   0   ID=cds-cox3;Parent=gene-cox3;transl_table=5
ESRC_83 mitos   CDS 6093    6440    .   +   0   ID=cds-nad3;Parent=gene-nad3;transl_table=5
ESRC_83 mitos   CDS 6803    8527    .   -   0   ID=cds-nad5;Parent=gene-nad5;transl_table=5
ESRC_83 mitos   CDS 802 1812    .   +   0   ID=cds-nad2;Parent=gene-nad2;transl_table=5
ESRC_83 mitos   CDS 8561    9937    .   -   0   ID=cds-nad4;Parent=gene-nad4;transl_table=5
ESRC_83 mitos   CDS 9931    10218   .   -   0   ID=cds-nad4l;Parent=gene-nad4l;transl_table=5
ADD COMMENT
0
Entering edit mode

And then I needed to edit the Gff slightly and now I am stuck again. I can get the genepred file and even a gtf file but popoolation won't read it. I tried to change things in such a way as to prevent a problem from emerging but failed.

ADD REPLY
0
Entering edit mode

Try AGAT to standardize your gff before feeding popoolation with it.

ADD REPLY

Login before adding your answer.

Traffic: 3067 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6