GFF3 to GTF problems
1
0
Entering edit mode
13 months ago
rank ▴ 10

Dear All,

my ultimate goal is to use the genewise option in popoolation to evaluate allele frequency variation at mitochondrial snps between two divergent populations. I need to convert the gff file to a gtf file first. So I have buried myself in different ways of annotating the GFF3 file to get popoolation to read it, and I get nowhere. So I have tried several gff converters. The original GFF3 file comes from MITOS or sometimes I import that into Geneious and export it again. I have worked for days on different annotation types, with lines for parents included or omitted. Nothing seems to work. I have been using the GenomeTools website to validate the file and I do find some errors. It looks like the command line version of GenomeTools would work a lot better, but its hard for me to figure out how to properly install it in macOS. Here's the output of that command (I used the biostars installation to get conda installed a few weeks ago and have been able to install other programs that way. But I just get errors when I try to install GenomeTools.

My ultimate goal is to get a gff that will convert to a gtf that will be readable by popoolation. So if there is another route to that goal without needing GenomeTools I would be happy to take it.

$conda install genometools Collecting package metadata (current_repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: \ Found conflicts! Looking for incompatible packages. This can take several minutes. Press CTRL-C to abort. failed UnsatisfiableError: The following specifications were found to be incompatible with the existing python installation in your environment: Specifications: - genometools -> python=2.7 Your python: python=3.7 If python is on the left-most side of the chain, that's the version you've asked for. When python appears to the right, that indicates that the thing on the left is somehow not available for the python version you are constrained to. Note that conda will not change your python version to a different minor version unless you explicitly specify that. (base)  popoolation gffread agat poolseq mitos • 673 views ADD COMMENT 1 Entering edit mode Use AGAT toolkit to do the conversion as shown here: A: converting .gff file to .gtf ADD REPLY 1 Entering edit mode When installing bioinformatics tools with conda, do not install them at the base environment, create an environment for the tool - environments will prevent this sort of conflict between incompatible program versions. conda create -n genometools genometools conda activate genometools  The first command will create an environment called genometools, and install genometools in this environment. The second command will activate this environment for use. ADD REPLY 0 Entering edit mode This is probably the easiest solution because it will also install python-2.7+ for you. ADD REPLY 0 Entering edit mode Thank you I got it installed that way ADD REPLY 0 Entering edit mode Try gffread from conda ADD REPLY 1 Entering edit mode In the end, I worked with AGAT, gffread, and Kent's tools to get the gtf. I had to simplify the gff to get one of them to work properly, and once I had a version that would run, I was able to add complexity back to the gff and then convert to gtf. ADD REPLY 0 Entering edit mode Feel free to post the steps and commands you used as an answer, it may help people in a similar situation as yours. ADD REPLY 0 Entering edit mode 13 months ago rank ▴ 10 1. I downloaded a gff file from a close relative to my study organism uploaded by a friend to Genbank. 2. I manually edited the MITOS output gff in BBEdit to match the format in the Genbank gff. Then I deleted the duplicate records for the RNA molecules, while keeping the record 'gene' and 'CDS' for the protein coding sequence. 3. I repeatedly checked my progress with the website GFF validator (http://genometools.org/cgi-bin/gff3validator.cgi) and kept editing and simplifying until my GFF passed the test with the green text 'Validation successful.' 4. At that point, I could run Kent's tools (UCSC) to get a GTF I could import into popoolation. gff3ToGenePred ESRC_83_Aug2020.gff ESRC_83_Aug2020.genePred $ genepredtogtf file ESRC_83_Aug2020.genePred ESRC_83_Aug2020.gtf


Here's part of the GFF that worked:

##gff-version 3
##sequence-region ESRC_83 1 18008
ESRC_83 Geneious    region  1   18008   .   +   0   ID=ESRC_83;Is_circular=true
ESRC_83 mitfi   rRNA    13096   14374   .   -   .   ID=rna-16SrRNA;product=16SrRNA;gene_biotype=rRNA
...
ESRC_83 mitfi   tRNA    4284    4352    .   +   .   ID=rna-tRNA_Lys;product=tRNA_Lys;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    4353    4415    .   +   .   ID=rna-tRNA_Asp;product=tRNA_Asp;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    592 663 .   +   .   ID=rna-tRNA_Ile;product=tRNA_Ile;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    6023    6086    .   +   .   ID=rna-tRNA_Gly;product=tRNA_Gly;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    6444    6509    .   +   .   ID=rna-tRNA_Ala;product=tRNA_Ala;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    6509    6573    .   +   .   ID=rna-tRNA_Arg;product=tRNA_Arg;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    6573    6637    .   +   .   ID=rna-tRNA_Asn;product=tRNA_Asn;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    6638    6704    .   +   .   ID=rna-tRNA_Ser1;product=tRNA_Ser1;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    665 733 .   -   .   ID=rna-tRNA_Gln;product=tRNA_Gln;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    6705    6767    .   +   .   ID=rna-tRNA_GLu;product=tRNA_Glu;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    6766    6828    .   -   .   ID=rna-tRNA_Phe;product=tRNA_Phe;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    733 801 .   +   .   ID=rna-tRNA_Met;product=tRNA_Met;gene_biotype=tRNA
ESRC_83 mitfi   tRNA    8546    8607    .   -   .   ID=rna-tRNA_H;product=tRNA_His;gene_biotype=tRNA
ESRC_83 mitos   gene    10861   11998   .   +   .   ID=gene-cytb;Name=cytb;gene=cytb;gene_biotype=protein_coding
ESRC_83 mitos   gene    1988    3535    .   +   .   ID=gene-cox1;Name=cox1;gene=cox1;gene_biotype=protein_coding
ESRC_83 mitos   gene    3617    4303    .   +   .   ID=gene-cox2;Name=cox2;gene=cox2;gene_biotype=protein_coding
ESRC_83 mitos   gene    4416    4571    .   +   .   ID=gene-atp8;Name=atp8;gene=atp8;gene_biotype=protein_coding
ESRC_83 mitos   gene    4565    5237    .   +   .   ID=gene-atp6;Name=atp6;gene=atp6;gene_biotype=protein_coding
ESRC_83 mitos   gene    5236    6075    .   +   .   ID=gene-cox3;Name=cox3;gene=cox3;gene_biotype=protein_coding
ESRC_83 mitos   CDS 10861   11998   .   +   0   ID=cds-cytb;Parent=gene-cytb;transl_table=5
ESRC_83 mitos   CDS 1988    3535    .   +   0   ID=cds-cox1;Parent=gene-cox1;transl_table=5
ESRC_83 mitos   CDS 3617    4303    .   +   0   ID=cds-cox2;Parent=gene-cox2;transl_table=5
ESRC_83 mitos   CDS 4416    4571    .   +   0   ID=cds-atp8;Parent=gene-atp8;transl_table=5
ESRC_83 mitos   CDS 4565    5237    .   +   0   ID=cds-atp6;Parent=gene-atp6;transl_table=5
ESRC_83 mitos   CDS 5236    6075    .   +   0   ID=cds-cox3;Parent=gene-cox3;transl_table=5

0
Entering edit mode

And then I needed to edit the Gff slightly and now I am stuck again. I can get the genepred file and even a gtf file but popoolation won't read it. I tried to change things in such a way as to prevent a problem from emerging but failed.

0
Entering edit mode

Try AGAT to standardize your gff before feeding popoolation with it.