Question

PAM50 subtype classification by genefu

2

Entering edit mode

5.9 years ago

vinayjrao ▴ 250

Hello,

I'm analyzing TCGA breast cancer data to classify the samples into their respective subtypes, ad then to check if the genes of our study have a subtype specific pattern of expression. To do this, I was suggested to use genefu. At the step of classifying the subtypes -

PAM50Base <- molecular.subtyping(sbt.model = "pam50",data=data, annot=annot,do.mapping=F)

I get an error -

Error in intrinsic.cluster.predict(sbt.model = pam50.robust, data = data, :

no probe in common -> annot or mapping parameters are necessary for the mapping process!

In the command, annot is the file used for annotation and is of the format -

probe EntrezGene.ID Gene.ID Gene.Symbol

Data refers to the input file, which is of the format -

Gene.Symbol Sample1 SAMPLE2 ... Sample 1092

Both files are tab delimited. I want to know if anyone has done this before, and if the file formatting is correct?

P.S. I have tried using the Gene.Symbol and probe in the data file, but both give the same error.

Edit: Should my data file also contain the EntrezGene.ID column?

Thank you.

RNA-Seq R bioconductor genefu pam50 • 5.6k views

ADD COMMENT • link updated 5.9 years ago by Matina ▴ 250 • written 5.9 years ago by vinayjrao ▴ 250

0

Entering edit mode

Hi vinayjrao,

Try using do.mapping=TRUE in the molecular.subtyping() function.

Matina

ADD REPLY • link 5.9 years ago by Matina ▴ 250

0

Entering edit mode

Hi Matina,

Thank you for the suggestion, but do.mapping=T gave me the following error Error in data1[, gg.uniq, drop = FALSE] : subscript out of bounds. Could you also explain how does do.mapping affect the run?

Thanks.

ADD REPLY • link 5.9 years ago by vinayjrao ▴ 250

1

Entering edit mode

Hi vinayjrao,

I think the problem is with the data matrix; molecular.subtyping fuction expects a matrix of samples(rows) x genes(cols). As I can see above your data matrix is genes x samples, right? Try transposing the matrix. Let me know if it solved your problem!

From the genefu vignette for do.mapping: TRUE if the mapping through Entrez Gene ids must be performed (in case of ambiguities, the most variant probe is kept for each gene)

Matina

ADD REPLY • link 5.9 years ago by Matina ▴ 250

0

Entering edit mode

Dear Matina,

I tried transposing the data matrix. It still gave me the same error as the first time. Would it be helpful if I shared the script with you?

Thanks.

Edit: I was looking into the column names, and there was an error on my part. Transposing the data helped. Thank you very much for all the help :)

ADD REPLY • link 5.9 years ago by vinayjrao ▴ 250

0

Entering edit mode

Great! Happy to help! I'll post it as an answer then.

ADD REPLY • link 5.9 years ago by Matina ▴ 250

0

Entering edit mode

@vinayjrao I am having the same error. Can you elaborate on how transposing the data solve your problem? Thanks

ADD REPLY • link 4.0 years ago by israawadalla • 0

0

Entering edit mode

In my case, I encountered the same error message when column names were "Entrez gene ids". When I convert column names into "gene symbols" molecular.subtyping function worked.

ADD REPLY • link 3.6 years ago by goknurginer • 0

0

Entering edit mode

In my case, the problem was resolved when I converted column names into gene symbols (in the transposed matrix). Hope that helps some of you who encounter the same error message after transposing the matrix.

ADD REPLY • link 3.6 years ago by goknurginer • 0

0

Entering edit mode

Hi vinayjrao,

I have exactly the same problems as you and I do not manage to resolve it by transforming the data. Can you please share the annotation file with me? Don't understand what I am doing wrong.

Best, Linnea

ADD REPLY • link 5.4 years ago by linnea.pettersson • 0

score 1 · Accepted Answer · 2018-05-10

Hi vinayjrao,

I think the problem is with the data matrix; molecular.subtyping fuction expects a matrix of samples(rows) x genes(cols). As I can see above your data matrix is genes x samples, right? Try transposing the matrix.

From the genefu vignette for do.mapping: TRUE if the mapping through Entrez Gene ids must be performed (in case of ambiguities, the most variant probe is kept for each gene)