Question: gff3 to bed12
3
gravatar for Yewon
2.2 years ago by
Yewon30
Norway
Yewon30 wrote:

I have been having challenges converting my gff3 file generated strawberry genome (Fragaria vesca) to a bed12 format which is required for annotating differentially methylated bases. I have read through several solutions offered but have not found the one that works for my data. However, I have come across a github script (https://github.com/pzross/iver/blob/master/R/bioinfo.R) which requires that I download gfftogenepred and genepredtobed12 tools from UCSC and run the scripts in R program inorder to generate the bed12 format. At the point of generating a gfftogenepred file, I get the following error message:

/tmp/tmp.gff:0: empty GFF file, must have header
/tmp/tmp.gff:0: invalid GFF3 header
GFF3: 2 parser errors

My GFF3 file looks fine (with 9 columns)

Please I need help.

Thank you in advance

bed12 R gff3 • 2.8k views
ADD COMMENTlink modified 8 months ago by Juke344.9k • written 2.2 years ago by Yewon30
1

If you already have the 2 tools from UCSC, did you try them without R?

gff3ToGenePred infile.gff3 temp.genePred
genePredToBed temp.genePred out.bed
ADD REPLYlink written 2.2 years ago by michael.ante3.6k

michael.ante, I did download the gff3ToGenePred and genePredToBed tools from UCSC through the Anaconda software package. However, when I run the following script in the Anaconda navigator terminal,I get errors. Below is the command I run and excerpts from the start and end of the response:

  1. Start
    1. (wgbs - cpg) brukers - MacBook - Pro - 3: ~bruker%code%nbsp;gff3ToGenePred / Users / bruker / Desktop / CpG\ Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3 out.GP - geneNameAttr = attr - bad = file - maxParseErrors = -50 maxConvertErrors = -50
  2. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:3: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_AED
  3. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:3: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_eAED
  4. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:3: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_QI
  5. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:7: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_AED
  6. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:7: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_eAED
  7. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:7: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_QI

End 1. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:405476: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_eAED 2. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3: 405476: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_QI 3. GFF3: 85764 parser errors

ADD REPLYlink written 2.2 years ago by Yewon30

It says, an attribute tag (like ID, Parent, or Name) must start with an alphabetic character. In your gff3's second line the attributes are:

ID=FvH4_c10g00030.1;Parent=FvH4_c10g00030;Name=FvH4_c10g00030.1;_AED=0.87;_eAED=1.00;_QI=0|-1|0|1|-1|1|1|0|80

Thus, _AED is not allowed since it doesn't start with a character. You can run a sed command to change it accordingly:

sed 's/;_/;x_/g' Fragaria_vesca_v4.0.a1.transcripts.gff3 > altered.transcripts.gff3

All attribute tags will then be changed having an x before the underscore.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by michael.ante3.6k

Interesting,

an attribute tag (like ID, Parent, or Name) must start with an alphabetic character.

gff3ToGenePred introduced peculiarity in the expected gff3 format that does not exist in the official definition of the format.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Juke344.9k

Maybe it's a requirement for genePred (although not mentioned here)?

ADD REPLYlink written 2.2 years ago by michael.ante3.6k

michael.ante, I do appreciate your help so far. I was able to introduce an "x" before the underscore. However, I have encountered another challenge in which the converted gff3 file still generates errors. Below is an excerpt of the message:

Command used: gff3ToGenePred - maxParseErrors=50 / Users / bruker / anaconda2 / envs / wgbs - cpg / edited.transcripts.gff3 Fragariavesca.GP

parsing error message

/Users/bruker/anaconda2/envs/wgbs-cpg/edited.transcripts.gff3:4: unknown standard attribute, user defined attributes must start with a lower-case letter:X_AED

/ Users / bruker / anaconda2 / envs / wgbs - cpg / edited.transcripts.gff3:4: unknown standard attribute, user defined attributes must start with a lower-case letter:X_eAED

/ Users / bruker / anaconda2 / envs / wgbs - cpg / edited.transcripts.gff3:4: unknown standard attribute, user defined attributes must start with a lower-case letter:X_QI

I looked into the converted file and realized that the "x" introduced before the underscore was in upper-case despite the fact that I used the lower-case "x". How can I fix this?

I am out of options. Please help

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Yewon30

Why not using the command I suggested, inserting a x instead of an X ?

ADD REPLYlink written 2.2 years ago by michael.ante3.6k

Please provide us few lines of the beginning of your gff3 file.

ADD REPLYlink written 2.2 years ago by Juke344.9k

below is the beginning of the gff3 file

#gff-version 3
contig_10   maker   gene    34303   34545   .   -   .   ID=FvH4_c10g00030;Name=FvH4_c10g00030
contig_10   maker   mRNA    34303   34545   .   -   .   ID=FvH4_c10g00030.1;Parent=FvH4_c10g00030;Name=FvH4_c10g00030.1;_AED=0.87;_eAED=1.00;_QI=0|-1|0|1|-1|1|1|0|80
contig_10   maker   exon    34303   34545   .   -   .   ID=FvH4_c10g00030.1:1;Parent=FvH4_c10g00030.1
contig_10   maker   CDS 34303   34545   .   -   0   ID=FvH4_c10g00030.1:cds;Parent=FvH4_c10g00030.1
contig_10   maker   gene    16709   16951   .   -   .   ID=FvH4_c10g00020;Name=FvH4_c10g00020
contig_10   maker   mRNA    16709   16951   .   -   .   ID=FvH4_c10g00020.1;Parent=FvH4_c10g00020;Name=FvH4_c10g00020.1;_AED=0.88;_eAED=1.00;_QI=0|-1|0|1|-1|1|1|0|80
contig_10   maker   exon    16709   16951   .   -   .   ID=FvH4_c10g00020.1:1;Parent=FvH4_c10g00020.1
contig_10   maker   CDS 16709   16951   .   -   0   ID=FvH4_c10g00020.1:cds;Parent=FvH4_c10g00020.1
contig_10   maker   gene    4883    5125    .   -   .   ID=FvH4_c10g00010;Name=FvH4_c10g00010
contig_10   maker   mRNA    4883    5125    .   -   .   ID=FvH4_c10g00010.1;Parent=FvH4_c10g00010;Name=FvH4_c10g00010.1;_AED=0.88;_eAED=1.00;_QI=0|-1|0|1|-1|1|1|0|80
contig_10   maker   exon    4883    5125    .   -   .   ID=FvH4_c10g00010.1:1;Parent=FvH4_c10g00010.1
contig_10   maker   CDS 4883    5125    .   -   0   ID=FvH4_c10g00010.1:cds;Parent=FvH4_c10g00010.1
###
contig_1    maker   gene    2432    2674    .   +   .   ID=FvH4_c1g00020;Name=FvH4_c1g00020
contig_1    maker   mRNA    2432    2674    .   +   .   ID=FvH4_c1g00020.1;Parent=FvH4_c1g00020;Name=FvH4_c1g00020.1;_AED=0.29;_eAED=0.29;_QI=0|-1|0|1|-1|1|1|0|80
contig_1    maker   exon    2432    2674    .   +   .   ID=FvH4_c1g00020.1:1;Parent=FvH4_c1g00020.1
contig_1    maker   CDS 2432    2674    .   +   0   ID=FvH4_c1g00020.1:cds;Parent=FvH4_c1g00020.1
contig_1    maker   gene    61177   63300   .   +   .   ID=FvH4_c1g00310;Name=FvH4_c1g00310
contig_1    maker   mRNA    61177   63300   .   +   .
ADD REPLYlink modified 2.2 years ago by zx87549.7k • written 2.2 years ago by Yewon30
1

According to the specs, the header should start with 2 '#':

The ##gff-version 3 line is required and must be the first line of the file. It introduces the annotation section of the file.

ADD REPLYlink written 2.2 years ago by michael.ante3.6k

Michael.ante, you are right. I mistakenly omitted one of the #when copying the file. The original file header is like this ##gff-version 3. Thank you for pointing out the error.

ADD REPLYlink written 2.2 years ago by Yewon30

zx8754, thanks for editing my gff3 file. it really looks more like the original version.

ADD REPLYlink written 2.2 years ago by Yewon30

So, your file looks perfectly fine. It's the most comprehensive gff3 file you can have. Either you don't provide the proper file to your tool (check the path), or the tool expects a particular gff-like file. Maybe the tool doesn't handle the ### and see that like an empty header? You could give a try providing only the first record with the ##gff-version 3 header as well.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Juke344.9k

If you are using R already, package rtracklayer should be able to do the same.

ADD REPLYlink written 2.2 years ago by Michael Dondrup48k

I did use rtracklayer as one of the packages for this conversion process but the problem arose when I was running a script to create an intermediate genepred file.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Yewon30

Otherwise I have a script in perl that should do the work. It's called gff2bed.pl in the GAAS repository.

ADD REPLYlink written 2.2 years ago by Juke344.9k
2
gravatar for Jeffin Rockey
2.2 years ago by
Jeffin Rockey1.1k
Karimannoor
Jeffin Rockey1.1k wrote:

Alternate method:

Download EA-Utils.

First run gff2gtf like below

gff2gtf file.gff3 >file.gtf

Then run

gtf2bed file.gtf >file.bed

This should produce a bed12 file corresponding to the initial gff3 file

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Jeffin Rockey1.1k

The second line should be gtf2bed --input=file.gtf >file.bed Thanks!

ADD REPLYlink written 9 months ago by adi.rotem0
1
gravatar for Juke34
8 months ago by
Juke344.9k
Sweden
Juke344.9k wrote:

answers here too A: conversion of GFF3 formate to BED format

ADD COMMENTlink written 8 months ago by Juke344.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 840 users visited in the last hour