SRA and Bioproject IDs
I have a group of Bioproject IDs and need to retrieve their corresponding SRA IDs.
I tried to retrieve the whole data from SRA using
kywrds <- entrez_search(db = "sra", retmax = 20000,
term = "Homo sapiens[ORGN] AND Homo sapiens[orgn:__txid9606]")
However, the result of the whole homosapien is more than 4 million records, so I should use "retstart" with the "web_history" arguments with the retmax argument, but unfortunately, I couldn't do that.
The result I want to obtain is data frame of SRA IDs with their corresponding bioproject IDs
Could you help me to do that?
You can search SRA directly using a BioProject ID. Shown below are EntrezDirect commands that you should be able to change the syntax to match that of BioPython.
esearch -db sra -query 'PRJEB4337[bioproject]'
You can then pass those results along to
esummary and extract relevant information from the output XML. For example,
esearch -db sra -query 'PRJEB4337[bioproject]' | esummary | xtract -pattern DocumentSummary -element Bioproject Biosample Run@acc
will give you a 3-column, tab-delimited table with BioProject, BioSample and SRA Run accessions.
Traffic: 2421 users visited in the last hour