Question: (Closed) CMSclassifier for colorectal cancer dataset
1
gravatar for bio94
12 months ago by
bio9440
bio9440 wrote:

Could someone give me suggestions on how I should prep my data to use it with CMS classifier, R package? I am currently working on subtyping colorectal cancer datasets using CMS and CRIS classifiers, R package. I am constantly running into errors everytime I run the command

CMSdata <- probesToEntrez(data, data_gpl, entrez = "Entrez.Gene")
Error in rowsum.data.frame(data, partition, na.rm = T) :non-numeric data frame in rowsum

Any help wth this, would honestly be appreciated. Much thanks.

ADD COMMENTlink written 12 months ago by bio9440

can u post what is there inside data and data_gpl

head(data)
head(data_gpl)

since its row sum error, i am assuming you have non numeric entries in data. How did you import the data?

Make sure data should be a dataframe with log2_scaled GEP data values, samples in columns, genes in rows, rownames corresponding to Entrez IDs.

The entrez id should be row names, not first column,

ADD REPLYlink written 12 months ago by pbpanigrahi180

Hi! thanks so much for the quick response.I used the following code to import the data

data <- read.table("GSE103479_Normalised_matrix.txt", header = TRUE, sep = "\t")

data_gpl <- read.table("GPL23985-23107.txt", header=TRUE, sep="\t")

I get the following on using head(data) and head(data_gpl):

> head(data)
            X S0849.00001.CEL S0849.00002.CEL S0849.00003.CEL S0849.00004.CEL S0849.00005.CEL
1 200000_s_at        6.796621        8.542676        8.256315        8.855615        7.914242
2   200001_at       10.048543       11.527307       11.370632       11.074517       10.510881
3   200002_at       13.079470       13.435510       12.819662       12.959815       13.231900
4 200003_s_at       12.324434       13.307174       12.542603       13.333878       12.697148
5   200004_at       10.256021       10.770011        9.872877       11.326367       11.220951
6   200005_at       10.012955       10.171843       10.173556       10.466301       10.316237
  S0849.00009a.CEL S0849.00010.CEL S0849.00011.CEL S0849.00012.CEL S0849.00013.CEL S0849.00014a.CEL
1         8.384417        7.728501         8.50463        8.792167        7.744168         8.481143
2        10.818284       11.657260        10.98738       11.343113        9.962316        11.729346
3        12.765878       12.069908        12.80063       12.915003       12.974710        12.991459
4        12.643501       11.838882        12.62687       13.073763       12.533959        13.004222
5        11.084293       10.628889        10.07673       10.476392       10.870931        10.880598
6        10.976519        9.445513        10.60698       10.468189       10.339754         9.439836
  S0849.00015.CEL S0849.00016a.CEL S0849.00017.CEL S0849.00018.CEL S0849.00019.CEL S0849.00020.CEL
1        7.909230         7.375004        8.390722        8.020519        7.873334        8.269778
2       11.203797        10.262192       10.752031       10.778888       10.700052       10.962991
3       13.146691        12.910918       12.851626       12.847498       12.721756       13.013380
4       12.618398        12.265095       12.660638       12.358775       12.260782       13.071409
5       10.778762        10.669658       11.021922       10.917207       11.219187       11.011195
6        9.303267        10.177099       10.430164       10.909031       10.441081       10.725297

...........


> head(data_gpl)
           ID         SPOT_ID              UniGene.ID
1 200000_s_at     NM_006445.3               Hs.181368
2   200001_at ENST00000457326               Hs.515371
3   200002_at ENST00000493018               Hs.182825
4 200003_s_at  NM_001136136.1               Hs.652114
5   200004_at  NM_001042559.2 Hs.183684 /// Hs.736508
6   200005_at ENST00000462641                Hs.55682
                                             Gene.Title Gene.Symbol
1                          pre-mRNA processing factor 8       PRPF8
2                              calpain, small subunit 1      CAPNS1
3                                 ribosomal protein L35       RPL35
4                                 ribosomal protein L28       RPL28
5   eukaryotic translation initiation factor 4 gamma, 2      EIF4G2
6 eukaryotic translation initiation factor 3, subunit D       EIF3D
                                                                            Ensembl Entrez.Gene
1 ENSG00000174231 /// ENSG00000274442 /// OTTHUMG00000090553 /// OTTHUMG00000191163       10594
2                                            ENSG00000126247 /// OTTHUMG00000160811         826
3                                            ENSG00000136942 /// OTTHUMG00000020659       11224
4                                            ENSG00000108107 /// OTTHUMG00000171991        6158
5                                            ENSG00000110321 /// OTTHUMG00000165823        1982
6                                            ENSG00000100353 /// OTTHUMG00000150599        8664
ADD REPLYlink modified 12 months ago • written 12 months ago by bio9440
1

Your first column in data contains id, thats why u were getting error. To resolve this try this

data <- read.table("GSE103479_Normalised_matrix.txt", header = TRUE, sep = "\t")
rownames(data) = data[,1];
data = data[,-1];
data_gpl <- read.table("GPL23985-23107.txt", header=TRUE, sep="\t")
rownames(data_gpl)=data_gpl[,1];

Let me know if this doesn't solves

ADD REPLYlink written 12 months ago by pbpanigrahi180

Woah! everything works perfectly now. I've been struggling with this for the past few days.

Honestly, appreciate your help. Thank you so much.

ADD REPLYlink written 12 months ago by bio9440

You can upvote the answer if you find useful. Also mark the question as resolved.

ADD REPLYlink written 12 months ago by pbpanigrahi180

You can mark the query as resolved. Thanks

ADD REPLYlink written 12 months ago by pbpanigrahi180

Hello paul.elliot1994!

We believe that this post does not fit the main topic of this site.

Resolved

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink written 12 months ago by bio9440
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 724 users visited in the last hour