Question: GFF3 file incompatibilities (source: BRAD)
0
gravatar for dejong.grant
2.7 years ago by
dejong.grant20
dejong.grant20 wrote:

Hi there,

I've been trying to analyze Brassica napus transcriptomic data for the purpose of isoform expression and incidence of splicing events which led me to use the Brassica Database GFF3 and fasta files for my index generation (STAR).

After a few errors I managed to get my STAR run working but subsequent software (e.g. rMATS require gtf files and the BRAD GFF3 doesn't seem to be compatible with any GFF3->gtf software.

(I've used gffread and genometools so far).

Has anyone had similar problems with the formatting of these BRAD annotation files?

Example formatting:

chrC03 GazeA2 mRNA 28541218 28543845 572.4227 + . ID=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001

chrC03 GazeA2 UTR 28543523 28543845 6.0158 + . Parent=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001

chrC03 GazeA2 CDS 28543454 28543522 29.9339 + 0 Parent=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001

chrC03 GazeA2 CDS 28543158 28543369 27.5481 + 1 Parent=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001

chrC03 GazeA2 CDS 28542958 28543060 27.3743 + 0 Parent=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001

Columns 1-8 are mostly consistent with sample GFF3 files but I've noticed a large space in the mRNA row between the score and strand columns. Also, the attribute column is different but I don't know if this is an acceptable departure from the norm.

I managed to get around this problem in STAR through: STAR --runMode genomeGenerate --genomeDir $1 --genomeFastaFiles $genfas --sjdbOverhang 99 --sjdbGTFfile $gff3 --sjdbGTFtagExonParentTranscript Parent --sjdbGTFfeatureExon CDS

Which seems to be correct, and following map job was successful.

Does anyone have any ideas as what could be causing this problem and/or any potential solutions?

Thanks in advance, I've been really wracking my brain.

ADD COMMENTlink written 2.7 years ago by dejong.grant20

Probably because there is no gene feature. I guess the converters expect those features.

I had so many issues during the past years with the different gff3 files you can find everywhere. There is often something missing for the tools that use gff3 as input. So, I decided to write a parser that works with any kind of gff (gff, gff2, and all gff3 flavours) and gtf too, which checks, completes, corrects the input file in order to create complete and standardized gff3 files. Most of my tools using gff3 files pass first by this parser.

If you want to have a try you can find the toolkit call AGAT here:

https://github.com/NBISweden/AGAT.git

To install it do:

conda install -c bioconda agat

Then to use the parser, the simplest way is to use this script:

agat_sp_gxf_to_gff3.pl

Plenty of other scripts are available... do agat_ and try autocompletion to see all of them.

ADD REPLYlink modified 12 weeks ago • written 2.7 years ago by Juke344.1k

Hi Jake, I have this gff output file from AUGUSTUS, through BRAKER, but it doesn't seem to conform to the standard file format. i want to rewrite the attribute column. can any of your scripts do this. thanks Kay

ADD REPLYlink written 2.2 years ago by Kay0

Using the agat_sp_gxf_to_gff3.pl you will end up with a full and standardized gff3 file.
It deals well with the weird Augustus output.
If you wish to manipulate the attributes in a specific way you can have a try to the script called: agat_sp_manage_attributes.pl

ADD REPLYlink modified 12 weeks ago • written 2.2 years ago by Juke344.1k

The "parent" is missing, probably a gene, as @Juke-34 says.

ADD REPLYlink written 2.7 years ago by Macspider3.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1624 users visited in the last hour