Question: NCBI: dowload all genomes obtained from soil/marine/host associated bacteria/organisms
1
gravatar for dago
20 months ago by
dago2.5k
Germany
dago2.5k wrote:

Let's say I would like to download from NCBI all genomes obtained for marine bacterial (or soil or gut associated). I figured that e-utilities could work for me.

Now, to get the information concerning the environmental source I should check the biosample. So I would do something like:

esearch -db biosample -query "marine" | efetch -format tabular

1: Photobacterium sanguinicancer CAIM 1827T
Identifiers: BioSample: SAMN04252530; Sample name: CAIM1827T.1; SRA: SRS1159004
Organism: Photobacterium sanguinicancri
Attributes:
    /strain="CAIM 1827"
    /host="Maja brachydactyla"
    /isolation source="Hemolymph"
    /collection date="06-Dec-2005"
    /geographic location="Spain: Ria a Coruna"
    /sample type="Bacterium"
    /altitude="0 m"
    /biomaterial provider="Collection of Aquatic Important Microorganisms"
    /culture collection="not applicable"
    /environment biome="marine"
    /host tissue sampled="hemolymph"
    /identified by="Bruno Gomez-Gil"
    /latitude and longitude="43.21 N 8.2200 W"
    /specimen voucher="not applicable"
Description:
    Draft genome of Photobacterium sanguinicancer type strain CAIM 1827T
    Accession: SAMN04252530 ID: 4252530
.....

Now, I would like to either download this assemblies/SRA or to access them, and this is making me quite confused. As far as I can read, I could use efetch, to retrieve sequences. However, there seem to be not direct link between querying biosamples and accessing the data via e-utilities.

Is someone out there taht could illuminate me?

ADD COMMENTlink modified 19 months ago by Charles Plessy2.6k • written 20 months ago by dago2.5k
1
gravatar for Pierre Lindenbaum
20 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

there seem to be not direct link between querying biosamples and accessing the data via e-utilities.

Is someone out there taht could illuminate me?

you have to call elink https://www.ncbi.nlm.nih.gov/books/NBK25499/#_chapter4_ELink_

$ wget -q -O - "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=biosample&db=taxonomy&id=6350818"

https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20101123/elink.dtd">
<eLinkResult>

  <LinkSet>
    <DbFrom>biosample</DbFrom>
    <IdList>
      <Id>6350818</Id>
    </IdList>
    <LinkSetDb>
      <DbTo>taxonomy</DbTo>
      <LinkName>biosample_taxonomy</LinkName>

        <Link>
                <Id>408172</Id>
            </Link>

    </LinkSetDb>
  </LinkSet>
</eLinkResult>

here biosample : 6350818 (is https://www.ncbi.nlm.nih.gov/biosample/?term=6350818 ) and the taxon is 408172 https://www.ncbi.nlm.nih.gov/taxonomy/?term=408172 "marine metagenome"

ADD COMMENTlink modified 20 months ago • written 20 months ago by Pierre Lindenbaum118k

Hi Pierre, thanks very much for the answer. So I looked into it and that's how far I got. Starting from a biosample I can get the link to the assembly for example:

esearch -db biosample -query "SAMN06971996" | elink -target assembly 
<ENTREZ_DIRECT>
  <Db>assembly</Db>
  <WebEnv>NCID_1_14264864_130.14.22.215_9001_1501173337_1595509801_0MetA0_S_MegaStore_F_1</WebEnv>
  <QueryKey>3</QueryKey>
  <Count>1</Count>
  <Step>2</Step>
</ENTREZ_DIRECT>

However, I still cannot figure out how to access the real sequence as efetch want work.

ADD REPLYlink written 20 months ago by dago2.5k
 ( (wget -q -O - "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=biosample&term=SAMN06971996"  | xmllint --xpath '/eSearchResult/IdList/Id[1]/text()' - && echo)  | xargs -I '{}' wget -q -O-  "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=biosample&db=taxonomy&id={}" | xmllint --xpath '/eLinkResult/LinkSet/LinkSetDb/Link/Id[1]/text()' - && echo ) | xargs -I '{}' wget -q -O-  "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy&id={}" | xmllint --xpath  '/TaxaSet/Taxon/ScientificName/text()' -


Candidatus Pelagibacter sp. TMED142
ADD REPLYlink modified 20 months ago • written 20 months ago by Pierre Lindenbaum118k

I guess you answers are too inscrutable for me to understand! :)

ADD REPLYlink written 19 months ago by dago2.5k
0
gravatar for Charles Plessy
19 months ago by
Charles Plessy2.6k
Japan
Charles Plessy2.6k wrote:

Once you have the name or identifiers of the species you want to download, have a look at the following discussion: C: Download All The Bacterial Genomes From Ncbi.

ADD COMMENTlink written 19 months ago by Charles Plessy2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1249 users visited in the last hour