esearch discrepancies using bash commands vs direct esearch command line
1
0
Entering edit mode
4.2 years ago
frcamacho ▴ 210

Hi there, I am using esearch to query symbols within a file against NCBI. I am using the following command

cat test-file.txt | while read p; do echo $p; esearch -db gene -query '"$p"[Gene name] AND Homo sapiens[ORGN] AND alive[Prop]' | efetch -format tabular > results.txt; done;

In the test-file.txt I just have the id LAMB2P1.

The results I receive in the results.txt is:

> tax_id    Org_name    GeneID  CurrentID   Status  **Symbol**  Aliases description other_designations  map_location    chromosome  genomic_nucleotide_accession.version    start_position_on_the_genomic_accession end_position_on_the_genomic_accession   orientation exon_count  OMIM    
> 9606  Homo sapiens    4948    0   live    **OCA2**    BEY, BEY1, BEY2, BOCA, D15S12,
> EYCL, EYCL2, EYCL3, HCL3, P, PED, SHEP1   OCA2 melanosomal transmembrane
> protein   P protein|P-protein|eye color 2 (central brown)|eye color 3
> (brown)|hair color 3 (brown)|melanocyte-specific transporter
> protein|oculocutaneous albinism II (pink-eye dilution homolog,
> mouse)|pink-eyed dilution protein homolog|total brown iris
> pigmentation  15q12-q13.1 15  NC_000015.10    27719008    28099342    minus   30  611409  
> 9606  Homo sapiens    8706    0   live    **B3GALNT1**    B3GALT3, GLCT3, GLOB, Gb4Cer,
> P, P1, beta3Gal-T3, galT3 beta-1,3-N-acetylgalactosaminyltransferase 1
> (globoside blood
> group)    UDP-GalNAc:beta-1,3-N-acetylgalactosaminyltransferase 1|P
> antigen synthase|P blood group globoside|UDP-Gal:betaGlcNAc beta
> 1,3-galactosyltransferase, polypeptide 3 (Globoside blood
> group)|UDP-GalNAc:betaGlcNAc beta-1,3-galactosaminyltransferase,
> polypeptide 1 (Globoside blood
> group)|UDP-N-acetylgalactosamine:globotriaosylceramide
> beta-1,3-N-acetylgalactosaminyltransferase|b3Gal-T3|beta-1,3-GalNAc-T1|beta-1,3-GalTase
> 3|beta-1,3-galactosyltransferase
> 3|beta-3-Gx-T3|brainiac1|galactosylgalactosylglucosylceramide
> beta-D-acetyl-galactosaminyltransferase|globoside
> synthase|globotriaosylceramide
> 3-beta-N-acetylgalactosaminyltransferase  3q26.1  3   NC_000003.12    161083883   161105469   minus   10  603094

This is not correct because when I search the NCBI gene database I get a completely different record (https://www.ncbi.nlm.nih.gov/gene/?term=LAMB2P1) which is correct when I just run the esearch on the command line as follows:

esearch -db gene -query '"LAMB2P1"[Gene name] AND Homo sapiens[ORGN] AND alive[Prop]' | efetch -format tabular

Which gives me:

> tax_id    Org_name    GeneID  CurrentID   Status  **Symbol** Aliases  description other_designations  map_location    chromosome  genomic_nucleotide_accession.version    start_position_on_the_genomic_accession end_position_on_the_genomic_accession   orientation exon_count  OMIM
> 9606  Homo sapiens    22973   0   live    **LAMB2P1** LAMB2L  laminin subunit beta 2
> pseudogene 1  laminin, beta 2 pseudogene
> 1 3p21.31 3   NC_000003.12    49152859    49154401    minus   1

Any thoughts or ways that I can correct this would be much appreciated. I have to query 1k symbols and I would like to use bash commands to do so instead of querying one by one. Thanks!

esearch entrezdirect NCBI • 1.2k views
ADD COMMENT
1
Entering edit mode

This is also a heads up for those using esearch. The esearch command consumed the entire file (stdin), hence I got output only for the first line. I used IFS=$'\n'; for next in $(cat file); do ..; done to get it to work with a file as input.

ADD REPLY
1
Entering edit mode

Good catch - this post has a bit more detail on what's going on: Bash loop using Edirect (esearch|efetch)

TL;DR: The problem is with the while loop, not with esearch.

ADD REPLY
3
Entering edit mode
4.2 years ago

your using single quotes in your loop.

'"$p"[Gene name] AND Homo sapiens[ORGN] AND alive[Prop]'

variables like '$p' WON'T be replaced by their values.

see : https://stackoverflow.com/questions/6697753/difference-between-single-and-double-quotes-in-bash

ADD COMMENT
2
Entering edit mode

Beat me to it. I was just running this test on bash:

bash-5.0$ p=LAMB2P1
bash-5.0$ echo $p
LAMB2P1
bash-5.0$ echo "$p"
LAMB2P1
bash-5.0$ echo '"$p"'
"$p"
ADD REPLY
1
Entering edit mode

Thank you Pierre! Updated code and it works!

cat test-file.txt | while read p; do echo $p; esearch -db gene -query "$p[Gene name] AND Homo sapiens[ORGN] AND alive[Prop]" | efetch -format tabular > results.txt; done;
ADD REPLY

Login before adding your answer.

Traffic: 1933 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6