Using GEOquery
2
0
Entering edit mode
17 months ago
Aryan ▴ 30

Hello,

I am trying to access data for all 404 samples in the GEO entry GSE120742 using GEOquery on R. My steps are,

library(GEOquery) 
eData <- getGEO("GSE120742")
expr <- eData[[1]]
exprs[expr]

Now, when I do this, I get "expr" as

$GSE120742_series_matrix.txt.gz
ExpressionSet (storageMode: lockedEnvironment)
assayData: 0 features, 404 samples                          - Notice this
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM3409642 GSM3409643 ... GSM3410057 (404
    total)
  varLabels: title geo_accession ... tissue:ch1 (47 total)
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
  pubMedIds: 30464211 
Annotation: GPL16791 

And exprs(expr) as

GSM3409642 GSM3409643 GSM3409644 GSM3409645 GSM3409646
     #  ... (that is, just sample names and no numeric data)

I thought this might be because the experiment is a mix of ChIP-seq and RNA-seq data. Indeed, for other examples in the vignette, they are able to do exprs[...] to get a matrix with actual values (maybe their experiments are on a single platform)?

However, I am therefore not sure how I can get data for this experiment, for ALL 404 samples, without downloading the supplementary files. Any ideas?

ncbi geo data R • 1.5k views
ADD COMMENT
0
Entering edit mode

Maybe you cannot do this in R. Note: Don't feed the trolls.

ADD REPLY
0
Entering edit mode

That's not an answer, nor helpful. Please put some effort.

ADD REPLY
0
Entering edit mode

I think this is enough effort from me. My effort has spurred your effort as this question has not been answered for three days, so that is all the effort that was required.

ADD REPLY
0
Entering edit mode

That isn't how the forum works.

If the thread hasn't obtained any attention in 3 days, that's just the nature of the beast. It is not a license to make flippant comments.

ADD REPLY
0
Entering edit mode

"There was no answer so I gave a wrong answer which made a responsible person step in and correct me, so I am right in what I did" is such a crappy stance to take.

ADD REPLY
0
Entering edit mode
17 months ago
ATpoint 81k

It is sequencing data, geoGEO is for arrays. For the RNA-seq there seems to be a count table in the supplement at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE120741 but and the ChIP-seq there is several files at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE120742 you can check. If it does not contain what you need you will need to download raw sequencing data and process it, e.g. via download links from sra-explorer.info sra-explorer : find SRA and FastQ download URLs in a couple of clicks

ADD COMMENT
0
Entering edit mode
17 months ago

If all you wanted is the SRA numbers then you can use entrez direct via the various interfaces. Just make sure to enter at the right accession, in this case is project number PRJNA494338

esearch -db sra -query PRJNA494338 | efetch -format runinfo > runinfo.csv

to list the SRR numbers you can do:

cat runinfo.csv | grep SRR | cut -f 1 -d , | head

that prints:

SRR7949418
SRR7949419
SRR7949420
SRR7949421
SRR7949422
SRR7949423
SRR7949424
SRR7949425
SRR7949426
SRR7949427

to download all files fire up GNU parallel:

cat runinfo.csv | grep SRR | cut -f 1 -d , | head -1 | parallel fastq-dump --split-3 -F {} 

(I am adding a head -1 to keep people that just copy pasta things without understanding them nuke themselves)

ADD COMMENT

Login before adding your answer.

Traffic: 2164 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6