Why are gene models from JGI non-sense when using AGAT to to extract proteins?
0
0
Entering edit mode
4 weeks ago
O.rka ▴ 710

https://genome.jgi.doe.gov/portal/pages/dynamicOrganismDownload.jsf?organism=Phypa1_1

I have 3 files:

  • Assembly: Physcomitrella_patens.1_1.allmasked.gz
  • GFF: Phypa1_1.FilteredModels.gff.gz
  • Proteins: proteins.Phypa1_1.FilteredModels.fasta.gz

There are no CDS sequences so I'm going to try and recreate the protein file and also the CDS file. However, when I use agat to extract the sequences, I get nonsense:

seqkit seq Physcomitrella_patens.1_1.allmasked.gz > Physcomitrella_patens.1_1.allmasked.fasta

agat_sp_extract_sequences.pl -f Physcomitrella_patens.1_1.allmasked.fasta -g Phypa1_1.FilteredModels.gff.gz -c agat_config.yaml -p -o test_proteins.fasta

My proteins are full to stop codons:

>agat-rna-1 gene=agat-gene-15249 seq_id=scaffold_1 type=cds
*SSKLQKHRAQVEHSVAHVHA*M*RWDFFGRGLGYGEEARRRNSTSFISHCRRDALQNFY
HSFTFEMRTKSVPTGDHTPEGT*DSYPLRLMGAKKAW*SVSKFTVVSHLR*QMQTLNAGT
QSRVIVGRRNNSVTELLHGMEIN*WFSTQSVRSLVAAKLQISSGGRKIRLR*ASGLILFP
LIG*Q*T*EIRLLFLPVINPLQALRPSPKSTLGKDMIHCEGQ*LLQADVMI*KRNYISMM
QVQ*LEYHCGSFRAIELLLATHLIP*WQPVQLIGA

The actual sequence for this record should be the following:

>jgi|Phypa1_1|63627|fgenesh1_pg.scaffold_1000001
MKFKAAKAQSPSGTFCGSCACMNVKMGFFWTGVGLWGRSKEEKQHKLHKSLSKRCIAEFL
PQFHIRDADEVRSNRRPYTGGDVRLLPTEVNGGEEGLVICLQVHRSLPSSVADADSECWD
AIPRYRWKKEQLSHRVVARDGNQLMVFDAICEVTGCCKAANIFRRSEDSVKVSFRLDFIS
AYRVTMNLRNSIVVPPCDQSTASITTLSEIHSRQRHDPLRGSVIAASRCHDLKKELHFHD
ASPITRVPLWQLSGHRVASCNPSHSLMAASSVNWS*

The genetic code is trans_table 1

enter image description here

translation models gif gene cgi • 340 views
ADD COMMENT
0
Entering edit mode

Are you sure the fasta file is sync with the annotation file? I.e the annotation gff/gtf has been done using this fasta?

ADD REPLY
0
Entering edit mode

It's difficult to be 100% sure but there's only one genome assembly. I've updated the question with the screenshot.

ADD REPLY

Login before adding your answer.

Traffic: 2207 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6