Custom made gff in geneious software - format conversion
0
0
Entering edit mode
10 weeks ago
L_bioinfo • 0

I have a custom made gff3 annotation file downloaded from geneious software. It created differently format structure which is accepted by any of the command line tools.

The file that I have:

Genome_Concatenated referenceToDocument: urn:local:.:t-fgyh5yz \ Concatenated sequence 1 108686 . + . Name=Scaffold0001 (concatenated sequence 1) '
Genome_Concatenated Geneious extracted region 1 108686 . + . Name=;Extracted interval=1 -> 108%2C686'
Genome_Concatenated GeneMark.hmm start_codon 654 656 . + 0 codon_start=1;GFF Attribute=gene_id "Gene0001";GFF Attribute=transcript_id "Transcript0001";GFF Attribute=protein_id "Protein0001"
Genome_Concatenated GeneMark.hmm CDS 654 2849 . + 0 codon_start=1;GFF Attribute=gene_id "Gene0001";GFF Attribute=transcript_id "Transcript0001";GFF Attribute=protein_id "Protein0001"
Genome_Concatenated GeneMark.hmm stop_codon 2847 2849 . + 0 codon_start=1;GFF Attribute=gene_id "Gene0001";GFF Attribute=transcript_id "Transcript0001";GFF Attribute=protein_id "Protein0001"


The file that I want could be either gtf or gff3

Genome_Concatenated referenceToDocument: urn:local:.:t-fgyh5yz \ Concatenated sequence 1 108686 . + . Name=Scaffold0001 (concatenated sequence 1) '
Genome_Concatenated Geneious extracted region 1 108686 . + . Name=;Extracted interval=1 -> 108%2C686'
Genome_Concatenated GeneMark.hmm start_codon 654 656 . + 0 gene_id "Gene0001";transcript_id "Transcript0001";protein_id "Protein0001"
Genome_Concatenated GeneMark.hmm CDS 654 2849 . + 0 gene_id "Gene0001";transcript_id "Transcript0001";protein_id "Protein0001"
Genome_Concatenated GeneMark.hmm stop_codon 2847 2849 . + 0 gene_id "Gene0001";transcript_id "Transcript0001";protein_id "Protein0001"


I installed __agat__ but I have perl bootversion error. Hence, kindly suggest other ways to convert the file

format Geneious gff • 518 views
1
Entering edit mode

If you had worked with Geneious up to this point why not try and continue in that program? It must have counting functionality built in? Have you asked Geneious support about this?

BTW: Is this a re-post of the issue you had previously posted in Retrieve count table from bam file using genomic coordinates ? You had got some suggestions there.

0
Entering edit mode

I have done all the works using command line tools. Annotation file is the primary data I got from my group. Apart from that I don't have access to the software hence I want continue using the tools

It is not a repost because in here I want to reformat from geneious format to standard gff3 or gtf format :)

0
Entering edit mode

I don't think there is a geneious format. As noted in other thread your file seems to be a mix of two formats with extraneous lines not defined in GTF/GFF format (LINK). For example

Genome_Concatenated referenceToDocument: urn:local:.:t-fgyh5yz \ Concatenated sequence 1 108686 . + . Name=Scaffold0001 (concatenated sequence 1) '
Genome_Concatenated Geneious extracted region 1 108686 . + . Name=;Extracted interval=1 -> 108%2C686'


You will have to reformat the file into standard GFF or GTF format but that will likely need some custom code if this is all geneious puts out.

You could always try and annotate the data on your own to get a normal GTF file (it would involve non-tirivial work), if annotation is all that you are using from geneious.

0
Entering edit mode

I am not well versed in coding I have begun learning and also it is quite a long document manual rectification of the file is not feasible. Although, thanks for the suggestions