Question: NCBI: dowload all genomes obtained from soil/marine/host associated bacteria/organisms
1
gravatar for dago
2.2 years ago by
dago2.5k
Germany
dago2.5k wrote:

Let's say I would like to download from NCBI all genomes obtained for marine bacterial (or soil or gut associated). I figured that e-utilities could work for me.

Now, to get the information concerning the environmental source I should check the biosample. So I would do something like:

esearch -db biosample -query "marine" | efetch -format tabular

1: Photobacterium sanguinicancer CAIM 1827T
Identifiers: BioSample: SAMN04252530; Sample name: CAIM1827T.1; SRA: SRS1159004
Organism: Photobacterium sanguinicancri
Attributes:
    /strain="CAIM 1827"
    /host="Maja brachydactyla"
    /isolation source="Hemolymph"
    /collection date="06-Dec-2005"
    /geographic location="Spain: Ria a Coruna"
    /sample type="Bacterium"
    /altitude="0 m"
    /biomaterial provider="Collection of Aquatic Important Microorganisms"
    /culture collection="not applicable"
    /environment biome="marine"
    /host tissue sampled="hemolymph"
    /identified by="Bruno Gomez-Gil"
    /latitude and longitude="43.21 N 8.2200 W"
    /specimen voucher="not applicable"
Description:
    Draft genome of Photobacterium sanguinicancer type strain CAIM 1827T
    Accession: SAMN04252530 ID: 4252530
.....

Now, I would like to either download this assemblies/SRA or to access them, and this is making me quite confused. As far as I can read, I could use efetch, to retrieve sequences. However, there seem to be not direct link between querying biosamples and accessing the data via e-utilities.

Is someone out there taht could illuminate me?

ADD COMMENTlink modified 2.2 years ago by Charles Plessy2.7k • written 2.2 years ago by dago2.5k
1
gravatar for Pierre Lindenbaum
2.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum123k wrote:

there seem to be not direct link between querying biosamples and accessing the data via e-utilities.

Is someone out there taht could illuminate me?

you have to call elink https://www.ncbi.nlm.nih.gov/books/NBK25499/#_chapter4_ELink_

$ wget -q -O - "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=biosample&db=taxonomy&id=6350818"

https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20101123/elink.dtd">
<eLinkResult>

  <LinkSet>
    <DbFrom>biosample</DbFrom>
    <IdList>
      <Id>6350818</Id>
    </IdList>
    <LinkSetDb>
      <DbTo>taxonomy</DbTo>
      <LinkName>biosample_taxonomy</LinkName>

        <Link>
                <Id>408172</Id>
            </Link>

    </LinkSetDb>
  </LinkSet>
</eLinkResult>

here biosample : 6350818 (is https://www.ncbi.nlm.nih.gov/biosample/?term=6350818 ) and the taxon is 408172 https://www.ncbi.nlm.nih.gov/taxonomy/?term=408172 "marine metagenome"

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Pierre Lindenbaum123k

Hi Pierre, thanks very much for the answer. So I looked into it and that's how far I got. Starting from a biosample I can get the link to the assembly for example:

esearch -db biosample -query "SAMN06971996" | elink -target assembly 
<ENTREZ_DIRECT>
  <Db>assembly</Db>
  <WebEnv>NCID_1_14264864_130.14.22.215_9001_1501173337_1595509801_0MetA0_S_MegaStore_F_1</WebEnv>
  <QueryKey>3</QueryKey>
  <Count>1</Count>
  <Step>2</Step>
</ENTREZ_DIRECT>

However, I still cannot figure out how to access the real sequence as efetch want work.

ADD REPLYlink written 2.2 years ago by dago2.5k
 ( (wget -q -O - "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=biosample&term=SAMN06971996"  | xmllint --xpath '/eSearchResult/IdList/Id[1]/text()' - && echo)  | xargs -I '{}' wget -q -O-  "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=biosample&db=taxonomy&id={}" | xmllint --xpath '/eLinkResult/LinkSet/LinkSetDb/Link/Id[1]/text()' - && echo ) | xargs -I '{}' wget -q -O-  "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy&id={}" | xmllint --xpath  '/TaxaSet/Taxon/ScientificName/text()' -


Candidatus Pelagibacter sp. TMED142
ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Pierre Lindenbaum123k

I guess you answers are too inscrutable for me to understand! :)

ADD REPLYlink written 2.2 years ago by dago2.5k
0
gravatar for Charles Plessy
2.2 years ago by
Charles Plessy2.7k
Japan
Charles Plessy2.7k wrote:

Once you have the name or identifiers of the species you want to download, have a look at the following discussion: C: Download All The Bacterial Genomes From Ncbi.

ADD COMMENTlink written 2.2 years ago by Charles Plessy2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1338 users visited in the last hour