How To Find Sample Information From Srr Labels In Geo?
2
3
Entering edit mode
8.4 years ago
user ▴ 870

From GEO one can download SRR* files (ending in .sra) of illumina data that can be extracted as fastq with fastq-dump. how can the sample information for these SRR* IDs be read programmatically from the GEO/SRA metadata? The project SRP005601 has a sample "SRR097786" which is not described anywhere in the SOFT/minimal files - those files are incredibly complicated. How can I find the information describing the sample label from GEO?

the only manual solution I found to this is through http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=search_obj ("Search SRA objects") in NCBI trace. I type in the SRR* id manually and then click around until I find the sample information, for each sample. this is a terrible manual solution so I was hoping to download this metadata and parse it from a csv file.

sra geo next-gen sequencing illumina bioinformatics • 6.3k views
6
Entering edit mode
8.4 years ago

SRR097786 is the run, not the sample (SRA is confusing). Assuming you're using R anyway:

library(SRAdb)
sra_con <- dbConnect(SQLite(),sqlfile)
res <- dbGetQuery(sra_con, "select * from sra_ft where run_accession='SRR097786'")


Most anything you want to know is in the various columns, which can be tailored. You can also just directly download the database and use it directly, if you prefer. If you only need to do this once, then you just have to click around on SRA to find what you want. It's an unfortunately complicated site.

2
Entering edit mode
5.6 years ago

I agree Devons solution is the most handy if you use R. However, it is missing some information (for instance, the number of spots for each sample) that you can get as follows:

wget -O SRP005601_metadata.csv 'http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term=SRP005601'