Hi Biostars community,
I want to use epost and esummary (NCBIs eutils) to obtain information on the lineage.
But I have some problems with accession numbers not starting like
cat "$ListWithAccessionNumbers" | epost -db protein |\ esummary -db taxonomy -format xml | \ xtract -pattern Seq-entry -element Org-ref_taxname, OrgName_lineage, NCBIeaa, Textseq-id_accession \ > SummaryTable.tsv
gives me a tsv file indeed, some cells are not filled with the requested information.
For the accession numbers not starting with
WP_ the accession number and sequence are not printed out, this will only be printed for the accession numbers starting with
So my current question is how can I obtain lineage information for those accession numbers using epost and esummary that do not start with
WP_ but still also get the accession number and sequence printed out? Is there anyone with experience regarding this?
If though you have some suggestions on how to use
esummary differently instead or know of an alternative way of using the NCBIs e-utilities to solve this problem, I am grateful for your ideas and help! Thank you!