Hello all,
Project background: I am trying to look at chemosensory evolution (GRs and ORs) across insect orders. I am specifically planning to look at GR and OR diversity between herbivore vs non-herbivore insect orders, look at selection and diversification rates along the branches between herbivore vs non-herbivore insect orders.
I have about 180 genomes that I have selected and downloaded from NCBI assembly (Genbank), most of which do not have annotation and I wanted to know the best way to bulk annotate these reference genomes so that I can get the list of proteins and genes in each of the reference genome so that I can then extract GRs and ORs from all the genomes.
I have been looking at Braker and EggNog but it looks like it is made for annotating novel genomes and might be slow to bulk annotate.
Thank you in advance!
I find it hard to believe that you downloaded many genomes from NCBI that do not have annotations. I think it is more likely that the annotations are there, but maybe you didn't look in the correct place. If you tell us a couple of genomes you downloaded and from where, we may be able to offer advice.
Hi Mensur,
Here are some of the genomes (accession numbers) I downloaded (from Genbank):
Out of the 180 genomes I downloaded, only 52 had the associated .gff and protein.faa files.
Mensur Dlakic posted links for
RefSeqversions of the genomes but correspondingGenBankversions should be available following similar links. ReplaceGCFwithGCA.https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/012/932/325/GCA_012932325.1_TpBJ-2018v1/
Some genomes may have
GenBankversions but noRefSeq. e.g https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/926/335/GCA_002926335.1_tcristinae_2.1/ This genome seems to have only genbank flat file version available (no GFF).https://github.com/jorvis/biocode/blob/master/gff/convert_genbank_to_gff3.py purportedly does GBFF to GFF conversions but you will need to verify that claim.