Question: Get fastq/sra from ArrayExpress and/or GEO programmatically for specific organism/experiment type
0
gravatar for rioualen
2.8 years ago by
rioualen390
France
rioualen390 wrote:

Hello,

I would like to get all the sequencing data for a specific organism and/or experiment type from ArrayExpress. I looked into REST queries here, and built the following request:

https://www.ebi.ac.uk/arrayexpress/xml/v2/experiments?query="Escherichia+coli+K-12"AND"ChIP-seq"

If I get the accession number from each experiment, I can get a table summarizing the samples:

http://www.ebi.ac.uk/arrayexpress/files/<accession>/<accession>.sdrf.txt

However, the fields don't have fixed names. I need to get either the SRR and SRX identifiers, or the ERR one, in order to reach the SRA files or fastq files:

ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX189/SRX189776/SRR576936/SRR576936.sra
ftp.sra.ebi.ac.uk/vol1/fastq/SRR576/SRR576936/SRR576936.fastq.gz

I would also like to do it from GEO, but then I need the GSE & GSM identifiers from the experiments, and I can't find them reliably either. This page seems useful but it doesn't say how to construct a query from scratch.

Overall, I'm completely lost by all the different types of identifiers and their connections...

sra fastq arrayexpress geo • 1.2k views
ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by rioualen390
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1892 users visited in the last hour