Question: Extracting strain name from several assemblies
0
gravatar for genomes_and_MGEs
9 months ago by
genomes_and_MGEs0 wrote:

Hey guys,

Another question: Some of the outputs don't have the strain name. I guess the reason is that the organism name doesn't have that info. For example here https://www.ncbi.nlm.nih.gov/assembly/GCF_003290365.1/. If I use

for f in GCF* ; do term=$(echo $f | cut -f1,2 -d'_') ; esearch -db assembly -q $term | esummary | xtract -pattern DocumentSummary -sep ' ' -element Organism,Strain,AssemblyAccession | sed 's/ /_/g' ; done > filenames.txt

The strain name doesn't appear on filenames.txt. Could you please let me know what I'm doing wrong?

Cheers

assembly genome • 184 views
ADD COMMENTlink modified 9 months ago by genomax75k • written 9 months ago by genomes_and_MGEs0
1

If I just run the example you posted it works but does not print a strain info:

$ esearch -db assembly -q GCF_003290365.1 | esummary | xtract -pattern DocumentSummary -sep ' ' -element Organism,Strain,AssemblyAccession
Pseudomonas putida (g-proteobacteria) GCF_003290365.1

It looks like the strain number is in a different field (sub_value) which you may need to include:

$ esearch -db assembly -q GCF_003290365.1 | esummary | xtract -pattern DocumentSummary -sep ' ' -element Organism,Sub_value,AssemblyAccession
Pseudomonas putida (g-proteobacteria) NX-1 GCF_003290365.1

You can try this and let us know if this works for other items on your list.

Edit: Re-reading your post it seems that you are not able to generate an answer (strain name). In that case you need to investigate term=$(echo $f | cut -f1,2 -d'_') to see what values you are getting for term. Put an echo $term to examine that variable in your loop (remove the esearch command temporarily, if needed).

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax75k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1957 users visited in the last hour