Let's say I would like to download from NCBI all genomes obtained for marine bacterial (or soil or gut associated). I figured that e-utilities could work for me.
Now, to get the information concerning the environmental source I should check the biosample. So I would do something like:
esearch -db biosample -query "marine" | efetch -format tabular
1: Photobacterium sanguinicancer CAIM 1827T
Identifiers: BioSample: SAMN04252530; Sample name: CAIM1827T.1; SRA: SRS1159004
Organism: Photobacterium sanguinicancri
Attributes:
/strain="CAIM 1827"
/host="Maja brachydactyla"
/isolation source="Hemolymph"
/collection date="06-Dec-2005"
/geographic location="Spain: Ria a Coruna"
/sample type="Bacterium"
/altitude="0 m"
/biomaterial provider="Collection of Aquatic Important Microorganisms"
/culture collection="not applicable"
/environment biome="marine"
/host tissue sampled="hemolymph"
/identified by="Bruno Gomez-Gil"
/latitude and longitude="43.21 N 8.2200 W"
/specimen voucher="not applicable"
Description:
Draft genome of Photobacterium sanguinicancer type strain CAIM 1827T
Accession: SAMN04252530 ID: 4252530
.....
Now, I would like to either download this assemblies/SRA or to access them, and this is making me quite confused.
As far as I can read, I could use efetch
, to retrieve sequences. However, there seem to be not direct link between querying biosamples and accessing the data via e-utilities.
Is someone out there taht could illuminate me?
Hi Pierre, thanks very much for the answer. So I looked into it and that's how far I got. Starting from a biosample I can get the link to the assembly for example:
However, I still cannot figure out how to access the real sequence as
efetch
want work.I guess you answers are too inscrutable for me to understand! :)