Question: Finding Gene Symbols for Probes in Raw Data
0
gravatar for samet
5 months ago by
samet0
samet0 wrote:

Dear all,

I'm working on GSE23561 dataset and GPL10775 platform.

There are RAW Data Files and a Normalized Series Matrix File in GEO. By mapping ID_REFs in normalized series matrix to IDs in platform, I can get gene symbols (Symbol v12) for any probe.

But, I need to construct a Non-Normalized Series Matrix and don't know how to get ID_REF values (so gene symbols) for probes in raw data. Probes are not ordered by ID_REFs in raw datasets as it is in the normalized series matrix.

A piece of raw data:

A piece of raw data

Here, ID_REF of the 6th row is not equal to 6. So, when I directly use row numbers as ID_REF, the gene symbol appears as MAR6, but it is actually HPRT1. I don't know what are these IDs stand for in raw data or can they be used to get ID_REFs.

Any suggestion is appreciated. Thanks!

geo chip-seq gse gpl gene • 265 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by samet0
0
gravatar for genomax
5 months ago by
genomax58k
United States
genomax58k wrote:

Information about annotations for this platform are available in this file. It comes from this platform page at NCBI. Scroll down and click on View Full Table link.

ADD COMMENTlink modified 5 months ago • written 5 months ago by genomax58k

Yes, I'm already using it to get gene symbols for probes of Normalized Series Matrix (by mapping ID_REFs to IDs in platform). But, I need to get symbols for probes in Non-normalized dataset (which does not have ID_REF values).

ADD REPLYlink modified 5 months ago • written 5 months ago by samet0

The file above should be for the platform ( Human 50K Exonic Evidence-Based Oligonucleotide array Technology type spotted oligonucleotide) and should contain everything on the array. It does not?

ADD REPLYlink written 5 months ago by genomax58k

Yes, it does. The problem is that, although raw data files and platform has the same number of rows (50400 for each), the order is not identical. E.g. the 6th row contains MAR6 gene in the raw data tables but HPRT1 in platform. Which means that the gene symbol of a probe with ID_REF = 6 is HPRT1 since it corresponds to ID = 6 in platform. So, I need to find ID_REF values for each row in RAW data to be able to use platform info. Right?

ADD REPLYlink modified 5 months ago • written 5 months ago by samet0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1247 users visited in the last hour