Question: how to get information for a series of accession ID ?
0
gravatar for Mo
4.5 years ago by
Mo890
/
Mo890 wrote:

Hello,

I have a series of accession and I am wondering whether there is a fast way to extract information related to each of them ? for example, I want to extract related to the following 

Instead to check them one by one using GEO. for example i want to get info whether they are RNAseq or something else, tissue , Characteristics etc . 

GSM1387801
GSM1387802
GSM1387803
GSM1387804
GSM1409334
GSM1409335

I tried in another way to first import all samples based on a platform ID in R based on previous question Querying Ncbi Geo By Platform Id .  However, I am afraid that this way does not lead me to what I really want and not working properly Lets imagine this as my platform ID GPL17301 consisting of 53 samples 

I did 

library(GEOquery)

gpl <- getGEO("GPL17301")

length(Meta(gpl)$series_id)

#It only showed 10 ????

 

 

python unix linux R • 1.4k views
ADD COMMENTlink modified 4 weeks ago by taserope0 • written 4.5 years ago by Mo890
1
gravatar for RamRS
4.5 years ago by
RamRS24k
Houston, TX
RamRS24k wrote:

You can use the R package GeoQuery to do this.

Ref:A: Extract Expression Profiles Of Specific Region From Geo

Protip: Always search existing questions before you start a question. Basic questions such as these are usually already addressed.

ADD COMMENTlink written 4.5 years ago by RamRS24k

@Ram thanks for your message but seems like you did not read my question. I am not looking to extract expression profiles . Please read my question carefully, if I can do that with GeoQuery , can you please provide me with an example based on the given accession IDs? I read the manual but I could not use it for this purpose 

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by Mo890

In my experience, most APIs can be used to fetch records, and the data you need seems to be part of the record. APIs usually parse the record into accessible formats for programmable analysis.

I have no experience with GeoQuery, but I was extrapolating from my experience and from common sense. I apologize f my answer was not specific to your query - I prefer showing people the path than taking them to their destination.

ADD REPLYlink written 4.5 years ago by RamRS24k
0
gravatar for A. Domingues
4.5 years ago by
A. Domingues2.1k
Dresden, Germany
A. Domingues2.1k wrote:

From your code I can see that gpl does not contain the information you are looking for:

> str(gpl)
Formal class 'GPL' [package "GEOquery"] with 2 slots
  ..@ dataTable:Formal class 'GEODataTable' [package "GEOquery"] with 2 slots
  .. .. ..@ columns:'data.frame':    0 obs. of  0 variables
  .. .. ..@ table  :'data.frame':    0 obs. of  0 variables
  ..@ header   :List of 14
  .. ..$ contact_country : chr "USA"
  .. ..$ contact_name    : chr ",,GEO"
  .. ..$ data_row_count  : chr "0"
  .. ..$ distribution    : chr "virtual"
  .. ..$ geo_accession   : chr "GPL17301"
  .. ..$ last_update_date: chr "Jun 17 2013"
  .. ..$ organism        : chr "Homo sapiens"
  .. ..$ sample_id       : chr [1:53] "GSM1166038" "GSM1166039" "GSM1166040" "GSM1166041" ...
  .. ..$ series_id       : chr [1:10] "GSE46876" "GSE48033" "GSE49477" "GSE50057" ...
  .. ..$ status          : chr "Public on Jun 17 2013"
  .. ..$ submission_date : chr "Jun 17 2013"
  .. ..$ taxid           : chr "9606"
  .. ..$ technology      : chr "high-throughput sequencing"
  .. ..$ title           : chr "Ion Torrent PGM (Homo sapiens)"

You can however retrieve all samples for that platform with:

library(GEOquery)

gpl <- getGEO("GPL17301")

head(gpl@header$sample_id)
[1] "GSM1166038" "GSM1166039" "GSM1166040" "GSM1166041" "GSM1166042"
[6] "GSM1166043"

 

From there we can extract the information for each sample:

samples <- gpl@header$sample_id

gps <- getGEO(samples[1])

 

With another `str` I figured out where the information regarding the library prep (and anything else) is:

str(gps)

gps@header$library_strategy

[1] "RNA-Seq"

 

Two notes:

- It will take some digging because not all records follow the same rules. They should, but I did experience some inconsistencies trying to find information in the past. Since this appears to have been done for the same entity you might be lucky and won't need to resort to `greps`.

- `str()` is my best fRiend.

ADD COMMENTlink written 4.5 years ago by A. Domingues2.1k

@fridaymeetssunday Thanks for your example, however there is one problem. In a platform you can find about RNA-Seq but for each sample, I need not for all, on the other hand, for example I also want to know that from which tissue they are coming from and some more info. This is my main question that I extract them simply. If you have any clue , I will really appreciate your help 

ADD REPLYlink written 4.5 years ago by Mo890

Maybe I misunderstood you want but if you do `str()` as indicated in my example, you will see the structure of the information for that sample (and it should be same for all samples). Then, again exemplified in my previous code, you will see that the tissue information can be obtained with `gps@header$characteristics_ch1`. Even more specific, with `gps@header$characteristics_ch1[1]` which tells you:

[1] "tissue: Pooled Tumor"

Other information you need can be traced with `str()`.

 

ADD REPLYlink written 4.5 years ago by A. Domingues2.1k

@fridaymeetssunday thanks! This is only for one sample, if I am going to do it, it is the same as I use the website , I am thinking of an automatic way to extract all info for all samples instead

ADD REPLYlink written 4.5 years ago by Mo890

I saw your post in the bioconductor forum and you are almost there. I suggest you replace the lapply with a for loop. Now it is time to put your R skills to use.

ADD REPLYlink written 4.5 years ago by A. Domingues2.1k

@fridaymeetssunday   why should I use the loop? if your data is huge then loop is the worst you might do since it takes more time to perform it; normally people avoid loop :-p  Sean Davis commented to use GEOmetadb while I have no idea how it works and it is so complicated package with no exact example (at least to me) ! I feel like, I should write something myself. 

ADD REPLYlink written 4.5 years ago by Mo890
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 751 users visited in the last hour