I'm evaluating NCBI's EDirect command line tool for the first time (as an alternative to using E-utilities and parsing the XML). It looks pretty good so far but I'm having trouble dealing with cases where I attempt to extract individual fields from a docsum and one of the fields are empty. For example, when I do a very simple query of the bioproject database for all "Pseudomonas aeruginosa PAO1" records:
esearch -db bioproject -query "Pseudomonas aeruginosa PAO1" | efetch -format docsum | xtract -pattern DocumentSummary -element Organism_Strain Project_Acc
some of the records returned do not have an Organism_Strain value returned and the Project_Acc value gets shifted to the first column:
I would like to have xtract return an empty string or even "-" for cases where an Organism_name is missing so that it would have two columns instead of a combination of one or two column lines:
I've looked over the official NCBI documentation and tried to assign each field to an initialized variable doing different combinations of the following:
esearch -db bioproject -query "Pseudomonas aeruginosa PAO1" | efetch -format docsum | xtract -pattern DocumentSummary -element "&STRAIN" Organism_Strain -STRAIN "(-)" -element Bioproject_acc
but I can't seem to find the correct syntax to make things work the way I want. Does anyone have any experience with this kind of problem?