multiple genes with the same ENTREZ ID
0
0
Entering edit mode
3.5 years ago

Hi! I'm a beginner in bioinformatics and trying to replicate the result from a paper named TAZ Expression as a Prognostic Indicator in Colorectal Cancer (https://www.researchgate.net/publication/235393359_TAZ_Expression_as_a_Prognostic_Indicator_in_Colorectal_Cancer)

Currently, I'm working with GSE14333 from GEO dataset.

enter image description here

To make Figure 1, I searched for the genes named "Axl", "WWTR1", "YAP1" and "CTGF" from each of their entrez id in data@featureData@data$ENTREZ_GENE_ID. I've obtained several genes (a row in the expression matrix) matching with the same entrez gene id. For e.g.

ID // GB_ACC // ... // Gene Symbol

213342_at // AI745185 // ... // YAP1

224894_at // BF247906 // ... // YAP1

224895_at // AA557632 // ... // YAP1

YAP1 matched with 3 rows, WWTR1 with 3 rows, AXL with 2 rows, and CTGF with 1 row.

It seems like each row for YAP1 is somehow distinct and each of them has different expression level in the expression matrix. Then how can I make the scatter plot above? Should I pick only one if there are multiple rows? Or can I just take the average expression level of all of them?

I hope this Target Description help identifying each of them in the case of YAP1.

[1] "gb:AI745185 /DB_XREF=gi:5113473 /DB_XREF=wg10a05.x1 /CLONE=IMAGE:2364656 /FEA=FLmRNA /CNT=46 /TID=Hs.8939.0 /TIER=Stack /STK=13 /UG=Hs.8939 /LL=10413 /UG_GENE=YAP65 /UG_TITLE=yes-associated protein 65 kDa /FL=gb:NM_006106.1"

[2] "gb:BF247906 /DB_XREF=gi:11163848 /DB_XREF=601858274F1 /CLONE=IMAGE:4068810 /FEA=EST /CNT=137 /TID=Hs.84520.0 /TIER=Stack /STK=51 /UG=Hs.84520 /UG_TITLE=ESTs"

[3] "gb:AA557632 /DB_XREF=gi:2328109 /DB_XREF=nl11g07.s1 /CLONE=IMAGE:1030044 /FEA=EST /CNT=137 /TID=Hs.84520.0 /TIER=Stack /STK=9 /UG=Hs.84520 /UG_TITLE=ESTs"

I'm stucked in here. Please give me a hand.

RNA-Seq • 686 views
ADD COMMENT
0
Entering edit mode

"_at" are Probe IDs from a microarray experiment, not Entrez IDs. You typically summarize Probe IDs onto a single value per gene, please read about microarray analysis. How did you process these data?

ADD REPLY
0
Entering edit mode

First, I obtained a gene expression level matrix (row: "_at" Probe IDs, column: samples). To replicate the paper, I tried to find out which Probe IDs correspond to "Axl", "WWTR1", "YAP1" and "CTGF". In the raw data, data@featureData@data contains a table of descriptions for each row. I've found Entrez IDs from the description table. Also, I searched for the genes one by one in wikipedia, and found mappings from each gene title to its Entrez ID. So I did an indexing on the rows which have the Entrez ID. But each gene title searched by the same Entrez ID had one or more Probe IDs.

ADD REPLY
0
Entering edit mode

The first sentence is they key. How did you do that?

ADD REPLY
0
Entering edit mode

by this code "exprs(data)" The raw data contain gene expression level matrix(normalized), gene information (descriptions for the rows), and clinic data (descriptions for the columns).

ADD REPLY

Login before adding your answer.

Traffic: 2633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6