Question: How to convert gff3 format, PASApipeline
0
gravatar for Ruixuan
11 days ago by
Ruixuan0
Ruixuan0 wrote:

Hi all,

I'm doing UTR annotation with the use of PASA. Refer to https://github.com/PASApipeline/PASApipeline/wiki/PASA_genome_annotation.

A gff3 file is needed in this process. The sample gff3 it provides is like

gi|68711        TIGR    gene    5662    6138    .       +       .       ID=68711.t00001;Name=my protein name
gi|68711        TIGR    mRNA    5662    6138    .       +       .       ID=model.68711.m00001;Parent=68711.t00001
gi|68711        TIGR    exon    5662    6138    .       +       .       ID=68711.e00001;Parent=model.68711.m00001
gi|68711        TIGR    CDS     5662    6138    .       +       0       ID=5662_6138cds_of_68711.m00001;Parent=model.68711.m00001

But the gff3 file I downloaded from NCBI is like this

AP018495.1      DDBJ    region  1       381277  .       +       .       ID=AP018495.1:1..381277;Dbxref=taxon:2080449;gbkey=Src;isolation-source=A water/soil sample collected from the Jozankei Onsen;mol_type=genomic DNA
AP018495.1      DDBJ    CDS     261     647     .       -       0       ID=cds-BBI30141.1;Dbxref=NCBI_GP:BBI30141.1;Name=BBI30141.1;Note=ORF1;gbkey=CDS;product=hypothetical protein;protein_id=BBI30141.1
AP018495.1      DDBJ    CDS     706     1308    .       +       0       ID=cds-BBI30142.1;Dbxref=NCBI_GP:BBI30142.1;Name=BBI30142.1;Note=ORF2;gbkey=CDS;product=putative HD hydrolase;protein_id=BBI30142.1

You can see that in my file, I only have "CDS", but in its sample gff3 there are "gene, mRNA, exon, and CDS"; I was wondering how can I convert my file into the required format.

Thanks in advance

rna-seq assembly • 73 views
ADD COMMENTlink modified 11 days ago by Juke345.2k • written 11 days ago by Ruixuan0
2
gravatar for Juke34
11 days ago by
Juke345.2k
Sweden
Juke345.2k wrote:

From AGAT

agat_convert_sp_gxf2gxf.pl --gff input.gff --ct protein_id -o standardized_file.gff

In this example, in order to collect CDS features belonging to the same mRNAm, the value of the protein_id attribute will be used. Here if a gene/locus has several isoforms, they will all have their own gene parent (Apparently there is no way in your file to see if there are isoforms). Adding --merge_loci will merge mRNA that overlap in their CDS parts under the same parent gene.

ADD COMMENTlink modified 10 days ago • written 11 days ago by Juke345.2k

Thank you so much!!!!

ADD REPLYlink written 10 days ago by Ruixuan0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2202 users visited in the last hour
_