How To Find Sample Information From Srr Labels In Geo?
Entering edit mode
11.0 years ago
user ▴ 940

From GEO one can download SRR files (ending in .sra) of illumina data that can be extracted as fastq with fastq-dump. how can the sample information for these SRR IDs be read programmatically from the GEO/SRA metadata? The project SRP005601 has a sample "SRR097786" which is not described anywhere in the SOFT/minimal files - those files are incredibly complicated. How can I find the information describing the sample label from GEO?

the only manual solution I found to this is through ("Search SRA objects") in NCBI trace. I type in the SRR* id manually and then click around until I find the sample information, for each sample. this is a terrible manual solution so I was hoping to download this metadata and parse it from a csv file.

geo illumina sra next-gen-sequencing • 8.3k views
Entering edit mode
11.0 years ago

SRR097786 is the run, not the sample (SRA is confusing). Assuming you're using R anyway:

sqlfile <- getSRAdbFile()
sra_con <- dbConnect(SQLite(),sqlfile)
res <- dbGetQuery(sra_con, "select * from sra_ft where run_accession='SRR097786'")

Most anything you want to know is in the various columns, which can be tailored. You can also just directly download the database and use it directly, if you prefer. If you only need to do this once, then you just have to click around on SRA to find what you want. It's an unfortunately complicated site.

Entering edit mode
8.2 years ago
thomaskuilman ▴ 850

I agree Devons solution is the most handy if you use R. However, it is missing some information (for instance, the number of spots for each sample) that you can get as follows:

wget -O SRP005601_metadata.csv ''

Login before adding your answer.

Traffic: 2643 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6