Using the SRAdb R package, I am looking to query individuals who possess both WGS and RNA-Seq data on the SRA. So far I believe I have been able to query specific samples with both WGS and RNA-Seq experiments, however a specific sample, provided by the sample_accession, is not exactly what I'm looking for.
For example: I'm interested in finding human data in which one human was used to derive WGS data, and several tissues from that same human were used to derive RNA-Seq data.
Is there any easy way to accomplish this using SQL and/or R?
EDIT
So far, this is my approach to identify samples that have both WGS and RNA-Seq data, using the local database, SRAmetadb.sqlite
:
con <- dbConnect(SQLite(),'SRAmetadb.sqlite')
query <- dbGetQuery(con,
paste(
"SELECT
sample.sample_accession,
sample.scientific_name,
experiment.experiment_accession,
experiment.library_strategy
FROM sample
JOIN experiment ON
sample.sample_accession = experiment.sample_accession
WHERE experiment.library_strategy in ('WGS','RNA-Seq')
GROUP by sample.sample_accession
HAVING COUNT(DISTINCT experiment.library_strategy) = 2"
)
)
With this approach, I can see query
contains 181 human samples that have at least one WGS experiment and one RNA-Seq experiment. That's great, but I feel I must be missing out on data that came from different samples but the same individuals.