Question

Using GEOquery

1

Entering edit mode

2.7 years ago

Aryan ▴ 40

Hello,

I am trying to access data for all 404 samples in the GEO entry GSE120742 using GEOquery on R. My steps are,

library(GEOquery) 
eData <- getGEO("GSE120742")
expr <- eData[[1]]
exprs[expr]

Now, when I do this, I get "expr" as

$GSE120742_series_matrix.txt.gz
ExpressionSet (storageMode: lockedEnvironment)
assayData: 0 features, 404 samples                          - Notice this
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM3409642 GSM3409643 ... GSM3410057 (404
    total)
  varLabels: title geo_accession ... tissue:ch1 (47 total)
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
  pubMedIds: 30464211 
Annotation: GPL16791

And exprs(expr) as

GSM3409642 GSM3409643 GSM3409644 GSM3409645 GSM3409646
     #  ... (that is, just sample names and no numeric data)

I thought this might be because the experiment is a mix of ChIP-seq and RNA-seq data. Indeed, for other examples in the vignette, they are able to do exprs[...] to get a matrix with actual values (maybe their experiments are on a single platform)?

However, I am therefore not sure how I can get data for this experiment, for ALL 404 samples, without downloading the supplementary files. Any ideas?

R geo ncbi • 2.8k views

ADD COMMENT • link updated 9 months ago by Ram 45k • written 2.7 years ago by Aryan ▴ 40

0

Entering edit mode

Maybe you cannot do this in R. Note: Don't feed the trolls.

ADD REPLY • link 2.7 years ago by M.O.L.S ▴ 100

1

Entering edit mode

That's not an answer, nor helpful. Please put some effort.

ADD REPLY • link 2.7 years ago by ATpoint 88k

0

Entering edit mode

I think this is enough effort from me. My effort has spurred your effort as this question has not been answered for three days, so that is all the effort that was required.

ADD REPLY • link 2.7 years ago by M.O.L.S ▴ 100

1

Entering edit mode

That isn't how the forum works.

If the thread hasn't obtained any attention in 3 days, that's just the nature of the beast. It is not a license to make flippant comments.

ADD REPLY • link 2.7 years ago by Joe 22k

1

Entering edit mode

"There was no answer so I gave a wrong answer which made a responsible person step in and correct me, so I am right in what I did" is such a crappy stance to take.

ADD REPLY • link 2.7 years ago by Ram 45k

0

Entering edit mode

I also come up with this problem. No body still answer this question.

ADD REPLY • link 9 months ago by Rohulla • 0

0

Entering edit mode

Have you even read my answer? I gave a full explanation on what the problem is and what the solutions can be. What is unclear?

ADD REPLY • link 9 months ago by ATpoint 88k

score 1 · Answer 1 · 2022-11-01

It is sequencing data, geoGEO is for arrays. For the RNA-seq there seems to be a count table in the supplement at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE120741 but and the ChIP-seq there is several files at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE120742 you can check. If it does not contain what you need you will need to download raw sequencing data and process it, e.g. via download links from sra-explorer.info sra-explorer : find SRA and FastQ download URLs in a couple of clicks

score 1 · Answer 2 · 2022-11-02

If all you wanted is the SRA numbers then you can use entrez direct via the various interfaces. Just make sure to enter at the right accession, in this case is project number PRJNA494338

esearch -db sra -query PRJNA494338 | efetch -format runinfo > runinfo.csv

to list the SRR numbers you can do:

cat runinfo.csv | grep SRR | cut -f 1 -d , | head

that prints:

SRR7949418
SRR7949419
SRR7949420
SRR7949421
SRR7949422
SRR7949423
SRR7949424
SRR7949425
SRR7949426
SRR7949427

to download all files fire up GNU parallel:

cat runinfo.csv | grep SRR | cut -f 1 -d , | head -1 | parallel fastq-dump --split-3 -F {}

(I am adding a head -1 to keep people that just copy pasta things without understanding them nuke themselves)