Question: Gff Converter For Use With Mummer/Promer (Microbial Genome)
1
gravatar for Mat
8.7 years ago by
Mat40
Mat40 wrote:

Hi,

Does anyone of a script to convert a gff3 file into a format that PROmer's mapview can accept for a bacterial genome? mapview's expectation is for eukaryotic genomes, and so it's looking for a gff file like this

D_melanogaster_2Rslice    Dpseudo    initial-exon    16918    16938    .    +    .    2R_CG14752.1
D_melanogaster_2Rslice    Dpseudo    internal-exon    17371    17432    .    +    .    2R_CG14752.1
D_melanogaster_2Rslice    Dpseudo    last-exon    17913    18168    .    +    .    2R_CG14752.1

My gff3 looks like this, which mapview is understandably not accepting.

BX571856.1    EMBL    gene    36132    36371    .    +    .    locus_tag=SAR0026
BX571856.1    EMBL    CDS    36132    36368    .    +    0    locus_tag=SAR0026;note=No significant database matches. Doubtful CDS;transl_table=11;product=hypothetical protein;protein_id=CAG39054.1;db_xref=GI:49240408;db_xref=UniProtKB%2FTrEMBL:Q6GKR9;exon_number=1
BX571856.1    EMBL    start_codon    36132    36134    .    +    0    locus_tag=SAR0026;note=No significant database matches. Doubtful CDS;transl_table=11;product=hypothetical protein;protein_id=CAG39054.1;db_xref=GI:49240408;db_xref=UniProtKB%2FTrEMBL:Q6GKR9;exon_number=1
BX571856.1    EMBL    stop_codon    36369    36371    .    +    0    locus_tag=SAR0026;note=No significant database matches. Doubtful CDS;transl_table=11;product=hypothetical protein;protein_id=CAG39054.1;db_xref=GI:49240408;db_xref=UniProtKB%2FTrEMBL:Q6GKR9;exon_number=1

I could extract all the lines like the first, but what I don't know how to do is persuade mapview to accept something that doesn't have the initial-exon, internal-exon,last-exon structure (incidentally, how are single-exon genes represented in this scheme?)

I guess I'm not the first to try use mummer/promer/mapview etc for bacterial work so there must be a solution out there somewhere? Sure hope so - I'm kindof new to this so would be happy to hear suggestions for how to go about this. Thanks M

gff • 2.6k views
ADD COMMENTlink written 8.7 years ago by Mat40
0
gravatar for farpostv
8.5 years ago by
farpostv0
farpostv0 wrote:

I am having a similar problem. I do not have a script to convert gff3 for use in mummer. But I did replace "gene" with "single-exon" and that seems to do the trick. However, you cannot see the gene direction. Interestingly, if I leave the "CDS" lines in, I get the "genes" and the "CDS" mapped on separate rows in the resulting mapview, but I cannot map just the CDS lines alone. Also I just repeat the same gff file for use as the "utr" file and the "cds" file for the mapview command (mapview -f pdf -p testmapview promer.coords test.gff test.gff)

Does anyone else have a better solution than this?

So some lines like this:

NC_009615.1 annotation remark 1 4811379 . . . gi=150006674

NC_009615.1 feature source 1 4811379 . + . strain=ATCC%208503;mol_type=genomic%20DNA

NC_009615.1 feature gene 1 1398 . + . db_xref=GeneID%3A5305156;locus_tag=BDI_0001;gene=dnaA

NC_009615.1 feature CDS 1 1398 . + 0 gene=dnaA;protein_id=YP_001301418.1

NC_009615.1 feature gene 1656 4196 . + . db_xref=GeneID%3A5305157;locus_tag=BDI_0002

NC_009615.1 feature CDS 1656 4196 . + 0 codon_start=1;product=ribonucleoside%20reductase;transl_table=11

NC_009615.1 feature gene 4220 6907 . + . db_xref=GeneID%3A5305158;locus_tag=BDI_0003

NC_009615.1 feature CDS 4220 6907 . + 0 codon_start=1;product=4-alpha-glucanotransferase;transl_table=11

NC_009615.1 feature gene 6947 7318 . - . db_xref=GeneID%3A5305352;locus_tag=BDI_0004

NC_009615.1 feature CDS 6947 7318 . - 0 codon_start=1;product=hypothetical%20protein;transl_table=11

NC_009615.1 feature gene 7410 7778 . + . db_xref=GeneID%3A5305353;locus_tag=BDI_0005

NC_009615.1 feature CDS 7410 7778 . + 0 codon_start=1;product=dihydroneopterin%20aldolase;transl_table=11

Becomes (just find and replace "gene" with "single-exon"):

NC_009615.1 annotation remark 1 4811379 . . . gi=150006674

NC_009615.1 feature source 1 4811379 . + . strain=ATCC%208503;mol_type=genomic%20DNA;

NC_009615.1 feature single-exon 1 1398 . + . db_xref=GeneID%3A5305156;locus_tag=BDI_0001;single-exon=dnaA

NC_009615.1 feature CDS 1 1398 . + 0 single-exon=dnaA;protein_id=YP_001301418.1

NC_009615.1 feature single-exon 1656 4196 . + . db_xref=GeneID%3A5305157;locus_tag=BDI_0002

NC_009615.1 feature CDS 1656 4196 . + 0 codon_start=1;product=ribonucleoside%20reductase;transl_table=11

NC_009615.1 feature single-exon 4220 6907 . + . db_xref=GeneID%3A5305158;locus_tag=BDI_0003

NC_009615.1 feature CDS 4220 6907 . + 0 codon_start=1;product=4-alpha-glucanotransferase;transl_table=11

NC_009615.1 feature single-exon 6947 7318 . - . db_xref=GeneID%3A5305352;locus_tag=BDI_0004

NC_009615.1 feature CDS 6947 7318 . - 0 codon_start=1;product=hypothetical%20protein;transl_table=11

NC_009615.1 feature single-exon 7410 7778 . + . db_xref=GeneID%3A5305353;locus_tag=BDI_0005

NC_009615.1 feature CDS 7410 7778 . + 0 codon_start=1;product=dihydroneopterin%20aldolase;transl_table=11

ADD COMMENTlink modified 8.5 years ago • written 8.5 years ago by farpostv0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 953 users visited in the last hour