Question

multiple genes with the same ENTREZ ID

0

Entering edit mode

3.5 years ago

nicolegu6616 • 0

Hi! I'm a beginner in bioinformatics and trying to replicate the result from a paper named TAZ Expression as a Prognostic Indicator in Colorectal Cancer (https://www.researchgate.net/publication/235393359_TAZ_Expression_as_a_Prognostic_Indicator_in_Colorectal_Cancer)

Currently, I'm working with GSE14333 from GEO dataset.

enter image description here

To make Figure 1, I searched for the genes named "Axl", "WWTR1", "YAP1" and "CTGF" from each of their entrez id in data@featureData@data$ENTREZ_GENE_ID. I've obtained several genes (a row in the expression matrix) matching with the same entrez gene id. For e.g.

ID // GB_ACC // ... // Gene Symbol

213342_at // AI745185 // ... // YAP1

224894_at // BF247906 // ... // YAP1

224895_at // AA557632 // ... // YAP1

YAP1 matched with 3 rows, WWTR1 with 3 rows, AXL with 2 rows, and CTGF with 1 row.

It seems like each row for YAP1 is somehow distinct and each of them has different expression level in the expression matrix. Then how can I make the scatter plot above? Should I pick only one if there are multiple rows? Or can I just take the average expression level of all of them?

I hope this Target Description help identifying each of them in the case of YAP1.

[1] "gb:AI745185 /DB_XREF=gi:5113473 /DB_XREF=wg10a05.x1 /CLONE=IMAGE:2364656 /FEA=FLmRNA /CNT=46 /TID=Hs.8939.0 /TIER=Stack /STK=13 /UG=Hs.8939 /LL=10413 /UG_GENE=YAP65 /UG_TITLE=yes-associated protein 65 kDa /FL=gb:NM_006106.1"

[2] "gb:BF247906 /DB_XREF=gi:11163848 /DB_XREF=601858274F1 /CLONE=IMAGE:4068810 /FEA=EST /CNT=137 /TID=Hs.84520.0 /TIER=Stack /STK=51 /UG=Hs.84520 /UG_TITLE=ESTs"

[3] "gb:AA557632 /DB_XREF=gi:2328109 /DB_XREF=nl11g07.s1 /CLONE=IMAGE:1030044 /FEA=EST /CNT=137 /TID=Hs.84520.0 /TIER=Stack /STK=9 /UG=Hs.84520 /UG_TITLE=ESTs"

I'm stucked in here. Please give me a hand.

RNA-Seq • 686 views

ADD COMMENT • link 3.5 years ago by nicolegu6616 • 0

0

Entering edit mode

"_at" are Probe IDs from a microarray experiment, not Entrez IDs. You typically summarize Probe IDs onto a single value per gene, please read about microarray analysis. How did you process these data?

ADD REPLY • link 3.5 years ago by ATpoint 82k

0

Entering edit mode

First, I obtained a gene expression level matrix (row: "_at" Probe IDs, column: samples). To replicate the paper, I tried to find out which Probe IDs correspond to "Axl", "WWTR1", "YAP1" and "CTGF". In the raw data, data@featureData@data contains a table of descriptions for each row. I've found Entrez IDs from the description table. Also, I searched for the genes one by one in wikipedia, and found mappings from each gene title to its Entrez ID. So I did an indexing on the rows which have the Entrez ID. But each gene title searched by the same Entrez ID had one or more Probe IDs.

ADD REPLY • link 3.5 years ago by nicolegu6616 • 0

0

Entering edit mode

The first sentence is they key. How did you do that?

ADD REPLY • link 3.5 years ago by ATpoint 82k

0

Entering edit mode

by this code "exprs(data)" The raw data contain gene expression level matrix(normalized), gene information (descriptions for the rows), and clinic data (descriptions for the columns).

ADD REPLY • link 3.5 years ago by nicolegu6616 • 0