Query individuals with both WGS and RNA-Seq data on SRA
0
0
Entering edit mode
7.8 years ago

Using the SRAdb R package, I am looking to query individuals who possess both WGS and RNA-Seq data on the SRA. So far I believe I have been able to query specific samples with both WGS and RNA-Seq experiments, however a specific sample, provided by the sample_accession, is not exactly what I'm looking for.

For example: I'm interested in finding human data in which one human was used to derive WGS data, and several tissues from that same human were used to derive RNA-Seq data.

Is there any easy way to accomplish this using SQL and/or R?

EDIT

So far, this is my approach to identify samples that have both WGS and RNA-Seq data, using the local database, SRAmetadb.sqlite:

con <- dbConnect(SQLite(),'SRAmetadb.sqlite')
query <- dbGetQuery(con,
                    paste(
                          "SELECT
                            sample.sample_accession,
                            sample.scientific_name,
                            experiment.experiment_accession,
                            experiment.library_strategy
                          FROM sample
                          JOIN experiment ON
                            sample.sample_accession = experiment.sample_accession
                          WHERE experiment.library_strategy in ('WGS','RNA-Seq')
                          GROUP by sample.sample_accession
                          HAVING COUNT(DISTINCT experiment.library_strategy) = 2"
                        )
                    )

With this approach, I can see query contains 181 human samples that have at least one WGS experiment and one RNA-Seq experiment. That's great, but I feel I must be missing out on data that came from different samples but the same individuals.

SRA SRAdb • 2.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 2693 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6