Question: how to get information for a series of accession ID ?
gravatar for Mo
5.4 years ago by
Mo920 wrote:


I have a series of accession and I am wondering whether there is a fast way to extract information related to each of them ? for example, I want to extract related to the following 

Instead to check them one by one using GEO. for example i want to get info whether they are RNAseq or something else, tissue , Characteristics etc . 


I tried in another way to first import all samples based on a platform ID in R based on previous question Querying Ncbi Geo By Platform Id .  However, I am afraid that this way does not lead me to what I really want and not working properly Lets imagine this as my platform ID GPL17301 consisting of 53 samples 

I did 


gpl <- getGEO("GPL17301")


#It only showed 10 ????



python unix linux R • 1.5k views
ADD COMMENTlink modified 11 months ago by taserope0 • written 5.4 years ago by Mo920
gravatar for RamRS
5.4 years ago by
Houston, TX
RamRS28k wrote:

You can use the R package GeoQuery to do this.

Ref:A: Extract Expression Profiles Of Specific Region From Geo

Protip: Always search existing questions before you start a question. Basic questions such as these are usually already addressed.

ADD COMMENTlink written 5.4 years ago by RamRS28k

@Ram thanks for your message but seems like you did not read my question. I am not looking to extract expression profiles . Please read my question carefully, if I can do that with GeoQuery , can you please provide me with an example based on the given accession IDs? I read the manual but I could not use it for this purpose 

ADD REPLYlink modified 5.4 years ago • written 5.4 years ago by Mo920

In my experience, most APIs can be used to fetch records, and the data you need seems to be part of the record. APIs usually parse the record into accessible formats for programmable analysis.

I have no experience with GeoQuery, but I was extrapolating from my experience and from common sense. I apologize f my answer was not specific to your query - I prefer showing people the path than taking them to their destination.

ADD REPLYlink written 5.4 years ago by RamRS28k
gravatar for A. Domingues
5.4 years ago by
A. Domingues2.3k
Dresden, Germany
A. Domingues2.3k wrote:

From your code I can see that gpl does not contain the information you are looking for:

> str(gpl)
Formal class 'GPL' [package "GEOquery"] with 2 slots
  ..@ dataTable:Formal class 'GEODataTable' [package "GEOquery"] with 2 slots
  .. .. ..@ columns:'data.frame':    0 obs. of  0 variables
  .. .. ..@ table  :'data.frame':    0 obs. of  0 variables
  ..@ header   :List of 14
  .. ..$ contact_country : chr "USA"
  .. ..$ contact_name    : chr ",,GEO"
  .. ..$ data_row_count  : chr "0"
  .. ..$ distribution    : chr "virtual"
  .. ..$ geo_accession   : chr "GPL17301"
  .. ..$ last_update_date: chr "Jun 17 2013"
  .. ..$ organism        : chr "Homo sapiens"
  .. ..$ sample_id       : chr [1:53] "GSM1166038" "GSM1166039" "GSM1166040" "GSM1166041" ...
  .. ..$ series_id       : chr [1:10] "GSE46876" "GSE48033" "GSE49477" "GSE50057" ...
  .. ..$ status          : chr "Public on Jun 17 2013"
  .. ..$ submission_date : chr "Jun 17 2013"
  .. ..$ taxid           : chr "9606"
  .. ..$ technology      : chr "high-throughput sequencing"
  .. ..$ title           : chr "Ion Torrent PGM (Homo sapiens)"

You can however retrieve all samples for that platform with:


gpl <- getGEO("GPL17301")

[1] "GSM1166038" "GSM1166039" "GSM1166040" "GSM1166041" "GSM1166042"
[6] "GSM1166043"


From there we can extract the information for each sample:

samples <- gpl@header$sample_id

gps <- getGEO(samples[1])


With another `str` I figured out where the information regarding the library prep (and anything else) is:



[1] "RNA-Seq"


Two notes:

- It will take some digging because not all records follow the same rules. They should, but I did experience some inconsistencies trying to find information in the past. Since this appears to have been done for the same entity you might be lucky and won't need to resort to `greps`.

- `str()` is my best fRiend.

ADD COMMENTlink written 5.4 years ago by A. Domingues2.3k

@fridaymeetssunday Thanks for your example, however there is one problem. In a platform you can find about RNA-Seq but for each sample, I need not for all, on the other hand, for example I also want to know that from which tissue they are coming from and some more info. This is my main question that I extract them simply. If you have any clue , I will really appreciate your help 

ADD REPLYlink written 5.4 years ago by Mo920

Maybe I misunderstood you want but if you do `str()` as indicated in my example, you will see the structure of the information for that sample (and it should be same for all samples). Then, again exemplified in my previous code, you will see that the tissue information can be obtained with `gps@header$characteristics_ch1`. Even more specific, with `gps@header$characteristics_ch1[1]` which tells you:

[1] "tissue: Pooled Tumor"

Other information you need can be traced with `str()`.


ADD REPLYlink written 5.4 years ago by A. Domingues2.3k

@fridaymeetssunday thanks! This is only for one sample, if I am going to do it, it is the same as I use the website , I am thinking of an automatic way to extract all info for all samples instead

ADD REPLYlink written 5.4 years ago by Mo920

I saw your post in the bioconductor forum and you are almost there. I suggest you replace the lapply with a for loop. Now it is time to put your R skills to use.

ADD REPLYlink written 5.4 years ago by A. Domingues2.3k

@fridaymeetssunday   why should I use the loop? if your data is huge then loop is the worst you might do since it takes more time to perform it; normally people avoid loop :-p  Sean Davis commented to use GEOmetadb while I have no idea how it works and it is so complicated package with no exact example (at least to me) ! I feel like, I should write something myself. 

ADD REPLYlink written 5.4 years ago by Mo920
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1589 users visited in the last hour