problem with retrieving multiple records from Entrez using while read loop
3
0
Entering edit mode
2.5 years ago
hans ▴ 20

Hello, I have a list of 60 NCBI biosamp entries, each in a new line. The file name is biosmp. I need to retrieve the bio sample attributes. I run the following command but I get only the first record.

cat biosmp | while read lines; do esearch -db biosample -query "${lines}" | efetch -format xml | xtract -pattern BioSampleSet -element Attribute; done;
Entrez esearch • 984 views
ADD COMMENT
0
Entering edit mode
2.5 years ago
GenoMax 141k

Example solution below.

$ more id
SAMN22787239
SAMN22787238
SAMN22787237
SAMN22787236
SAMN22787235
SAMN22787234

$ for i in `cat id`; do efetch -db biosample -id ${i} -format xml | xtract -pattern BioSampleSet -element Attribute ; done
not collected   not collected   submitter lab   not collected   Spleen  PvSRA treated Splenic Fibroblasts   biological replicate 3
not collected   not collected   submitter lab   not collected   Spleen  PvSRA treated Splenic Fibroblasts   biological replicate 2
not collected   not collected   submitter lab   not collected   Spleen  PvSRA treated Splenic Fibroblasts   biological replicate 1
not collected   not collected   ScienCell Research Laboratories not collected   Spleen  untreated Splenic Fibroblasts Catalog No.: 5530 biological replicate 3
not collected   not collected   ScienCell Research Laboratories not collected   Spleen  untreated Splenic Fibroblasts Catalog No.: 5530 biological replicate 2
not collected   not collected   ScienCell Research Laboratories not collected   Spleen  untreated Splenic Fibroblasts Catalog No.: 5530 biological replicate 1

epost equivalent:

$ epost -db biosample -format acc -input id | efetch -format xml | xtract -pattern BioSampleSet -element Attribute

Cleaner view (only one sample shown):

$ for i in `cat id`; do efetch -db biosample -id ${i}  ; done
1: Human sample from Homo sapiens
Identifiers: BioSample: SAMN22787239; Sample name: PvSRA-3; SRA: SRS10786264
Organism: Homo sapiens
Attributes:
    /isolate="not collected"
    /age="not collected"
    /biomaterial provider="submitter lab"
    /sex="not collected"
    /tissue="Spleen"
    /cell line="PvSRA treated Splenic Fibroblasts"
    /replicate="biological replicate 3"
Accession: SAMN22787239 ID: 22787239
ADD COMMENT
0
Entering edit mode

Thank you. The efetch loop worked well . epost was 10 time faster, but it concatenated all records to one line. The xargs solution retrieved only the first record.

ADD REPLY
0
Entering edit mode
2.5 years ago
Mensur Dlakic ★ 27k

Using xargs:

cat biosmp | xargs -i esearch -db biosample -query "{}" | efetch -format xml | xtract -pattern BioSampleSet -element Attribute
ADD COMMENT
0
Entering edit mode
2.5 years ago
vkkodali_ncbi ★ 3.7k

For your command to work, you need to add </dev/null to the esearch portion of it as described here. The following should fetch data for all 60 entries. Note, I modified the xtract command to use -pattern BioSample to address the issue of all output being on the same line.

cat biosmp | while read lines; do 
    esearch -db biosample -query "${lines}"  </dev/null \
    | efetch -format xml \
    | xtract -pattern BioSample -element Attribute; 
done
ADD COMMENT

Login before adding your answer.

Traffic: 2219 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6