How to download Accession List from NCBI on CLI?
1
0
Entering edit mode
5 weeks ago
Cam • 0

Hi! I'm an undergraduate working on a Python script to streamline some processes for my advisor, and my program relies on an Accession List input (like the data available for download from the helianthus page we are primarily interested in here: https://www.ncbi.nlm.nih.gov/Traces/study/?query_key=5&WebEnv=MCID_65e232280b2ad618bb791372&o=acc_s%3Aa). Are there any command line tools that I can use to make downloading this information quicker? Ideally, I would like to be able to input the clade of interest as an argument when I run the script. I've been looking into various tools but I don't have enough experience to know how I could possibly apply them in this case. If anyone has any idea please let me know!

CLI NCBI python • 403 views
ADD COMMENT
0
Entering edit mode

I edited the URL above to remove proxy information that was specific for your institution.

ADD REPLY
2
Entering edit mode
5 weeks ago
GenoMax 141k

Take a look at two tools provided by EntrezDirect (LINK) or Datasets (LINK). If you are primarily interested in SRA data then datasets would not be useful.

With EntrezDirect you can do something like (result truncated to save space) :

$ esearch -db sra -query "helianthus [orgn]" | efetch -format runinfo | head -3
Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
SRR27373299,2023-12-28 14:38:25,2023-12-28 13:09:47,22238783,6671634900,22238783,300,2167,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos3/sra-pub-zq-22/SRR027/27373/SRR27373299/SRR27373299.lite.1,SRX23049802,NO200001,RNA-Seq,Oligo-dT,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina HiSeq X,SRP480481,PRJNA1057422,,1057422,SRS20010336,SAMN39130651,simple,4233,Helianthus tuberosus,D0_1,,,,,,,no,,,,,QINGHAI UNIVERSITY,SRA1776123,,public,04588A7AE334D2D7C81EF99D0E3A10F9,3997F0D116464DEB4D2A8D5BC7C4491B
SRR27373298,2023-12-28 14:38:25,2023-12-28 13:12:01,23639891,7091967300,23639891,300,2287,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos3/sra-pub-zq-22/SRR027/27373/SRR27373298/SRR27373298.lite.1,SRX23049803,NO200002,RNA-Seq,Oligo-dT,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina HiSeq X,SRP480481,PRJNA1057422,,1057422,SRS20010337,SAMN39130652,simple,4233,Helianthus tuberosus,D0_2,,,,,,,no,,,,,QINGHAI UNIVERSITY,SRA1776123,,public,0D53DAD007245801D2E3241ACCA2B05E,A35468AE913D728D517669D85FA6854B

Recently @LauferVA wrote a tool that may be useful in this case to download the metadata: Selecting query format for repeated calls to NCBI's API

ADD COMMENT
0
Entering edit mode

Thank you so much! This is perfect. I really very genuinely appreciate your help thank you!!

ADD REPLY
0
Entering edit mode

Please accept my answer (green checkmark) to provide closure to this thread.

ADD REPLY

Login before adding your answer.

Traffic: 2480 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6