Question: gff3 to bed12
0
gravatar for Yewon
6 months ago by
Yewon0
Norway
Yewon0 wrote:

I have been having challenges converting my gff3 file generated strawberry genome (Fragaria vesca) to a bed12 format which is required for annotating differentially methylated bases. I have read through several solutions offered but have not found the one that works for my data. However, I have come across a github script (https://github.com/pzross/iver/blob/master/R/bioinfo.R) which requires that I download gfftogenepred and genepredtobed12 tools from UCSC and run the scripts in R program inorder to generate the bed12 format. At the point of generating a gfftogenepred file, I get the following error message:

/tmp/tmp.gff:0: empty GFF file, must have header
/tmp/tmp.gff:0: invalid GFF3 header
GFF3: 2 parser errors

My GFF3 file looks fine (with 9 columns)

Please I need help.

Thank you in advance

bed12 R gff3 • 756 views
ADD COMMENTlink modified 6 months ago by Jeffin Rockey1.1k • written 6 months ago by Yewon0

Please provide us few lines of the beginning of your gff3 file.

ADD REPLYlink written 6 months ago by Juke-342.1k

below is the beginning of the gff3 file

#gff-version 3
contig_10   maker   gene    34303   34545   .   -   .   ID=FvH4_c10g00030;Name=FvH4_c10g00030
contig_10   maker   mRNA    34303   34545   .   -   .   ID=FvH4_c10g00030.1;Parent=FvH4_c10g00030;Name=FvH4_c10g00030.1;_AED=0.87;_eAED=1.00;_QI=0|-1|0|1|-1|1|1|0|80
contig_10   maker   exon    34303   34545   .   -   .   ID=FvH4_c10g00030.1:1;Parent=FvH4_c10g00030.1
contig_10   maker   CDS 34303   34545   .   -   0   ID=FvH4_c10g00030.1:cds;Parent=FvH4_c10g00030.1
contig_10   maker   gene    16709   16951   .   -   .   ID=FvH4_c10g00020;Name=FvH4_c10g00020
contig_10   maker   mRNA    16709   16951   .   -   .   ID=FvH4_c10g00020.1;Parent=FvH4_c10g00020;Name=FvH4_c10g00020.1;_AED=0.88;_eAED=1.00;_QI=0|-1|0|1|-1|1|1|0|80
contig_10   maker   exon    16709   16951   .   -   .   ID=FvH4_c10g00020.1:1;Parent=FvH4_c10g00020.1
contig_10   maker   CDS 16709   16951   .   -   0   ID=FvH4_c10g00020.1:cds;Parent=FvH4_c10g00020.1
contig_10   maker   gene    4883    5125    .   -   .   ID=FvH4_c10g00010;Name=FvH4_c10g00010
contig_10   maker   mRNA    4883    5125    .   -   .   ID=FvH4_c10g00010.1;Parent=FvH4_c10g00010;Name=FvH4_c10g00010.1;_AED=0.88;_eAED=1.00;_QI=0|-1|0|1|-1|1|1|0|80
contig_10   maker   exon    4883    5125    .   -   .   ID=FvH4_c10g00010.1:1;Parent=FvH4_c10g00010.1
contig_10   maker   CDS 4883    5125    .   -   0   ID=FvH4_c10g00010.1:cds;Parent=FvH4_c10g00010.1
###
contig_1    maker   gene    2432    2674    .   +   .   ID=FvH4_c1g00020;Name=FvH4_c1g00020
contig_1    maker   mRNA    2432    2674    .   +   .   ID=FvH4_c1g00020.1;Parent=FvH4_c1g00020;Name=FvH4_c1g00020.1;_AED=0.29;_eAED=0.29;_QI=0|-1|0|1|-1|1|1|0|80
contig_1    maker   exon    2432    2674    .   +   .   ID=FvH4_c1g00020.1:1;Parent=FvH4_c1g00020.1
contig_1    maker   CDS 2432    2674    .   +   0   ID=FvH4_c1g00020.1:cds;Parent=FvH4_c1g00020.1
contig_1    maker   gene    61177   63300   .   +   .   ID=FvH4_c1g00310;Name=FvH4_c1g00310
contig_1    maker   mRNA    61177   63300   .   +   .
ADD REPLYlink modified 6 months ago by zx87547.1k • written 6 months ago by Yewon0
1

According to the specs, the header should start with 2 '#':

The ##gff-version 3 line is required and must be the first line of the file. It introduces the annotation section of the file.

ADD REPLYlink written 6 months ago by michael.ante3.2k

Michael.ante, you are right. I mistakenly omitted one of the #when copying the file. The original file header is like this ##gff-version 3. Thank you for pointing out the error.

ADD REPLYlink written 6 months ago by Yewon0

zx8754, thanks for editing my gff3 file. it really looks more like the original version.

ADD REPLYlink written 6 months ago by Yewon0

So, your file looks perfectly fine. It's the most comprehensive gff3 file you can have. Either you don't provide the proper file to your tool (check the path), or the tool expects a particular gff-like file. Maybe the tool doesn't handle the ### and see that like an empty header? You could give a try providing only the first record with the ##gff-version 3 header as well.

ADD REPLYlink modified 6 months ago • written 6 months ago by Juke-342.1k

If you are using R already, package rtracklayer should be able to do the same.

ADD REPLYlink written 6 months ago by Michael Dondrup46k

I did use rtracklayer as one of the packages for this conversion process but the problem arose when I was running a script to create an intermediate genepred file.

ADD REPLYlink modified 6 months ago • written 6 months ago by Yewon0

Otherwise I have a script in perl that should do the work. It's called gff2bed.pl in the GAAS repository.

ADD REPLYlink written 6 months ago by Juke-342.1k

If you already have the 2 tools from UCSC, did you try them without R?

gff3ToGenePred infile.gff3 temp.genePred
genePredToBed temp.genePred out.bed
ADD REPLYlink written 6 months ago by michael.ante3.2k

michael.ante, I did download the gff3ToGenePred and genePredToBed tools from UCSC through the Anaconda software package. However, when I run the following script in the Anaconda navigator terminal,I get errors. Below is the command I run and excerpts from the start and end of the response:

  1. Start
    1. (wgbs - cpg) brukers - MacBook - Pro - 3: ~bruker%code%nbsp;gff3ToGenePred / Users / bruker / Desktop / CpG\ Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3 out.GP - geneNameAttr = attr - bad = file - maxParseErrors = -50 maxConvertErrors = -50
  2. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:3: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_AED
  3. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:3: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_eAED
  4. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:3: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_QI
  5. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:7: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_AED
  6. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:7: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_eAED
  7. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:7: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_QI

End 1. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:405476: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_eAED 2. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3: 405476: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_QI 3. GFF3: 85764 parser errors

ADD REPLYlink written 6 months ago by Yewon0

It says, an attribute tag (like ID, Parent, or Name) must start with an alphabetic character. In your gff3's second line the attributes are:

ID=FvH4_c10g00030.1;Parent=FvH4_c10g00030;Name=FvH4_c10g00030.1;_AED=0.87;_eAED=1.00;_QI=0|-1|0|1|-1|1|1|0|80

Thus, _AED is not allowed since it doesn't start with a character. You can run a sed command to change it accordingly:

sed 's/;_/;x_/g' Fragaria_vesca_v4.0.a1.transcripts.gff3 > altered.transcripts.gff3

All attribute tags will then be changed having an x before the underscore.

ADD REPLYlink modified 6 months ago • written 6 months ago by michael.ante3.2k

Interesting,

an attribute tag (like ID, Parent, or Name) must start with an alphabetic character.

gff3ToGenePred introduced peculiarity in the expected gff3 format that does not exist in the official definition of the format.

ADD REPLYlink modified 6 months ago • written 6 months ago by Juke-342.1k

Maybe it's a requirement for genePred (although not mentioned here)?

ADD REPLYlink written 6 months ago by michael.ante3.2k

michael.ante, I do appreciate your help so far. I was able to introduce an "x" before the underscore. However, I have encountered another challenge in which the converted gff3 file still generates errors. Below is an excerpt of the message:

Command used: gff3ToGenePred - maxParseErrors=50 / Users / bruker / anaconda2 / envs / wgbs - cpg / edited.transcripts.gff3 Fragariavesca.GP

parsing error message

/Users/bruker/anaconda2/envs/wgbs-cpg/edited.transcripts.gff3:4: unknown standard attribute, user defined attributes must start with a lower-case letter:X_AED

/ Users / bruker / anaconda2 / envs / wgbs - cpg / edited.transcripts.gff3:4: unknown standard attribute, user defined attributes must start with a lower-case letter:X_eAED

/ Users / bruker / anaconda2 / envs / wgbs - cpg / edited.transcripts.gff3:4: unknown standard attribute, user defined attributes must start with a lower-case letter:X_QI

I looked into the converted file and realized that the "x" introduced before the underscore was in upper-case despite the fact that I used the lower-case "x". How can I fix this?

I am out of options. Please help

ADD REPLYlink modified 6 months ago • written 6 months ago by Yewon0

Why not using the command I suggested, inserting a x instead of an X ?

ADD REPLYlink written 6 months ago by michael.ante3.2k
1
gravatar for Jeffin Rockey
6 months ago by
Jeffin Rockey1.1k
Karimannoor
Jeffin Rockey1.1k wrote:

Alternate method:

Download EA-Utils.

First run gff2gtf like below

gff2gtf file.gff3 >file.gtf

Then run

gtf2bed file.gtf >file.bed

This should produce a bed12 file corresponding to the initial gff3 file

ADD COMMENTlink modified 6 months ago • written 6 months ago by Jeffin Rockey1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1861 users visited in the last hour