background: I need to be able to recognize all possible sequence identifiers present in preformatted NCBI nucleotide databases. I've implemented regular expression following https://www.ncbi.nlm.nih.gov/Sequin/acc.html, but it is not enough. Other accessions (e.g. PDB) are also present. So I would like to have examples of all possible formats I can encounter. But I was not able to find any list which would describe what actually can be inside those databases.
One possible solution, I thought would be to use ENTREZ to retrieve the accessions for me. There is
blastdbinfo database which lists the avalible databases. But I not able to get
elink to link anywhere.
Lets for example focus on
The database is available with following command:
esearch -query refseq_genomes[DB] -db blastdbinfo
So given that I want nucleotide sequence accessions present in that database what the elink statement should be?
esearch -query refseq_genomes[DB] -db blastdbinfo | ... SOME ELINK .... | efetch --format acc
For ENTREZ experts here - How do I tell which database links where?
I know I can download the databases and use
blastdbcmd to obtain the accessions, but It should be possible to obtain the accessions in some better way.