Question

Using GEOquery

0

Entering edit mode

17 months ago

Aryan ▴ 30

Hello,

I am trying to access data for all 404 samples in the GEO entry GSE120742 using GEOquery on R. My steps are,

library(GEOquery) 
eData <- getGEO("GSE120742")
expr <- eData[[1]]
exprs[expr]

Now, when I do this, I get "expr" as

$GSE120742_series_matrix.txt.gz
ExpressionSet (storageMode: lockedEnvironment)
assayData: 0 features, 404 samples                          - Notice this
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM3409642 GSM3409643 ... GSM3410057 (404
    total)
  varLabels: title geo_accession ... tissue:ch1 (47 total)
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
  pubMedIds: 30464211 
Annotation: GPL16791

And exprs(expr) as

GSM3409642 GSM3409643 GSM3409644 GSM3409645 GSM3409646
     #  ... (that is, just sample names and no numeric data)

I thought this might be because the experiment is a mix of ChIP-seq and RNA-seq data. Indeed, for other examples in the vignette, they are able to do exprs[...] to get a matrix with actual values (maybe their experiments are on a single platform)?

However, I am therefore not sure how I can get data for this experiment, for ALL 404 samples, without downloading the supplementary files. Any ideas?

ncbi geo data R • 1.5k views

ADD COMMENT • link updated 17 months ago by M.O.L.S ▴ 100 • written 17 months ago by Aryan ▴ 30

0

Entering edit mode

Maybe you cannot do this in R. Note: Don't feed the trolls.

ADD REPLY • link 17 months ago by M.O.L.S ▴ 100

0

Entering edit mode

That's not an answer, nor helpful. Please put some effort.

ADD REPLY • link 17 months ago by ATpoint 81k

0

Entering edit mode

I think this is enough effort from me. My effort has spurred your effort as this question has not been answered for three days, so that is all the effort that was required.

ADD REPLY • link 17 months ago by M.O.L.S ▴ 100

0

Entering edit mode

That isn't how the forum works.

If the thread hasn't obtained any attention in 3 days, that's just the nature of the beast. It is not a license to make flippant comments.

ADD REPLY • link 17 months ago by Joe 21k

0

Entering edit mode

"There was no answer so I gave a wrong answer which made a responsible person step in and correct me, so I am right in what I did" is such a crappy stance to take.

ADD REPLY • link 17 months ago by Ram 43k

score 0 · Answer 1 · 2022-11-01

It is sequencing data, geoGEO is for arrays. For the RNA-seq there seems to be a count table in the supplement at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE120741 but and the ChIP-seq there is several files at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE120742 you can check. If it does not contain what you need you will need to download raw sequencing data and process it, e.g. via download links from sra-explorer.info sra-explorer : find SRA and FastQ download URLs in a couple of clicks

score 0 · Answer 2 · 2022-11-02

If all you wanted is the SRA numbers then you can use entrez direct via the various interfaces. Just make sure to enter at the right accession, in this case is project number PRJNA494338

esearch -db sra -query PRJNA494338 | efetch -format runinfo > runinfo.csv

to list the SRR numbers you can do:

cat runinfo.csv | grep SRR | cut -f 1 -d , | head

that prints:

SRR7949418
SRR7949419
SRR7949420
SRR7949421
SRR7949422
SRR7949423
SRR7949424
SRR7949425
SRR7949426
SRR7949427

to download all files fire up GNU parallel:

cat runinfo.csv | grep SRR | cut -f 1 -d , | head -1 | parallel fastq-dump --split-3 -F {}

(I am adding a head -1 to keep people that just copy pasta things without understanding them nuke themselves)