Question: multiple genes with the same ENTREZ ID
gravatar for nicolegu6616
3 days ago by
nicolegu66160 wrote:

Hi! I'm a beginner in bioinformatics and trying to replicate the result from a paper named TAZ Expression as a Prognostic Indicator in Colorectal Cancer (

Currently, I'm working with GSE14333 from GEO dataset.

enter image description here

To make Figure 1, I searched for the genes named "Axl", "WWTR1", "YAP1" and "CTGF" from each of their entrez id in data@featureData@data$ENTREZ_GENE_ID. I've obtained several genes (a row in the expression matrix) matching with the same entrez gene id. For e.g.

ID // GB_ACC // ... // Gene Symbol

213342_at // AI745185 // ... // YAP1

224894_at // BF247906 // ... // YAP1

224895_at // AA557632 // ... // YAP1

YAP1 matched with 3 rows, WWTR1 with 3 rows, AXL with 2 rows, and CTGF with 1 row.

It seems like each row for YAP1 is somehow distinct and each of them has different expression level in the expression matrix. Then how can I make the scatter plot above? Should I pick only one if there are multiple rows? Or can I just take the average expression level of all of them?

I hope this Target Description help identifying each of them in the case of YAP1.

[1] "gb:AI745185 /DB_XREF=gi:5113473 /DB_XREF=wg10a05.x1 /CLONE=IMAGE:2364656 /FEA=FLmRNA /CNT=46 /TID=Hs.8939.0 /TIER=Stack /STK=13 /UG=Hs.8939 /LL=10413 /UG_GENE=YAP65 /UG_TITLE=yes-associated protein 65 kDa /FL=gb:NM_006106.1"

[2] "gb:BF247906 /DB_XREF=gi:11163848 /DB_XREF=601858274F1 /CLONE=IMAGE:4068810 /FEA=EST /CNT=137 /TID=Hs.84520.0 /TIER=Stack /STK=51 /UG=Hs.84520 /UG_TITLE=ESTs"

[3] "gb:AA557632 /DB_XREF=gi:2328109 /DB_XREF=nl11g07.s1 /CLONE=IMAGE:1030044 /FEA=EST /CNT=137 /TID=Hs.84520.0 /TIER=Stack /STK=9 /UG=Hs.84520 /UG_TITLE=ESTs"

I'm stucked in here. Please give me a hand.

rna-seq • 83 views
ADD COMMENTlink modified 3 days ago • written 3 days ago by nicolegu66160

"_at" are Probe IDs from a microarray experiment, not Entrez IDs. You typically summarize Probe IDs onto a single value per gene, please read about microarray analysis. How did you process these data?

ADD REPLYlink written 3 days ago by ATpoint40k

First, I obtained a gene expression level matrix (row: "_at" Probe IDs, column: samples). To replicate the paper, I tried to find out which Probe IDs correspond to "Axl", "WWTR1", "YAP1" and "CTGF". In the raw data, data@featureData@data contains a table of descriptions for each row. I've found Entrez IDs from the description table. Also, I searched for the genes one by one in wikipedia, and found mappings from each gene title to its Entrez ID. So I did an indexing on the rows which have the Entrez ID. But each gene title searched by the same Entrez ID had one or more Probe IDs.

ADD REPLYlink modified 3 days ago • written 3 days ago by nicolegu66160

The first sentence is they key. How did you do that?

ADD REPLYlink written 3 days ago by ATpoint40k

by this code "exprs(data)" The raw data contain gene expression level matrix(normalized), gene information (descriptions for the rows), and clinic data (descriptions for the columns).

ADD REPLYlink written 3 days ago by nicolegu66160
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1949 users visited in the last hour