I'm using the command line e-utilities tools to get nucleotide sequences from Biosample IDs. RIght now I'm working it out for just one ID, and then I will address all 50+ IDs I need to fetch. I would like to limit the efetch to only capture the complete genome.
Initial code showing the results that I want to filter:
esearch -db biosample -query "SAMN04014953" | elink -target nuccore | efilter -source refseq | efetch -format fasta | grep '>'
>NZ_CP021549.1 Klebsiella pneumoniae strain AR_0112, complete genome >NZ_CP021548.1 Klebsiella pneumoniae strain AR_0112 plasmid tig00000005, complete sequence >NZ_CP021547.1 Klebsiella pneumoniae strain AR_0112 plasmid tig00000003, complete sequence >NZ_CP021546.1 Klebsiella pneumoniae strain AR_0112 plasmid tig00000002, complete sequence >NZ_CP021545.1 Klebsiella pneumoniae strain AR_0112 plasmid tig00000001, complete sequence >NZ_CP021544.1 Klebsiella pneumoniae strain AR_0112 plasmid tig00000000, complete sequence
I want only the first result. (To save time, I'd rather set up that filter in the efetch command rather than filtering after fetching all the results.)
How I've tried to alter the efetch command in the above query:
efilter -source refseq -field "complete genome"
This produces no output.
efilter -source refseq -query "genome"
This produces the same output as removing the -query "genome" from the efilter command.
Any suggestions? I've looked through the Entrez Direct guide and did a Google search, but I still can't come up with a solution.