How to retrieve from a GFF3 file a AA and CDS files
1
0
Entering edit mode
4.2 years ago
Ric ▴ 430

I have an annotation file in GFF3 format, but I do not have the amino acid and cds sequences any more. Is there a tool which can retrieve those files from a genome in FASTA format and a GFF3 file?

Thank you in advance

gene annotation • 2.6k views
ADD COMMENT
0
Entering edit mode

This question has been extensively discussed previously at: changing ID in an existing GFF3 file

ADD REPLY
0
Entering edit mode
4.2 years ago
Juke34 8.5k

Yes using agat_sp_extract_sequences.pl from AGAT

CDS:
agat_sp_extract_sequences.pl --gff myfFile.gff -f fastaFile.fa -o cds.fa

Protein:
agat_sp_extract_sequences.pl --gff myfFile.gff -f fastaFile.fa -p -o protein.fa

Otherwise with bedtools or gffread

ADD COMMENT
0
Entering edit mode

While using agat to convert to protein I got:

(In case your file contains only CDS features, and your organism is prokaryote (e.g rast file), using ID as comon_tag might be the solution.)

13 warning messages: Peculiar rare case, we found 8 three_prime_utr while 12 expected.
Either some are supernumerary or some have been merged they overlap or are adjacent while they are not suppose to.
In case you were using gtf file as input (no parent/id attributes), check you provide the attribute (i.e comon_tag) used to group features together (e.g. locus_tag, gene_id, etc.).
(In case your file contains only CDS features, and your organism is prokaryote (e.g rast file), using ID as comon_tag might be the solution.)

Is there a way to find out more information about it or to fix it?

Thank you in advance,

ADD REPLY
0
Entering edit mode

I would say, you don't need to worry about, because you are interested in proteins and the warning concern the UTRs. If you want to see which UTR/gene it concerns, run agat_sp_gxf_to_gff3.pl with high verbosity e.g -v 2

ADD REPLY

Login before adding your answer.

Traffic: 3180 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6