PAM50 subtype classification by genefu
1
2
Entering edit mode
6.4 years ago
vinayjrao ▴ 250

Hello,

I'm analyzing TCGA breast cancer data to classify the samples into their respective subtypes, ad then to check if the genes of our study have a subtype specific pattern of expression. To do this, I was suggested to use genefu. At the step of classifying the subtypes -

PAM50Base <- molecular.subtyping(sbt.model = "pam50",data=data, annot=annot,do.mapping=F)

I get an error -

Error in intrinsic.cluster.predict(sbt.model = pam50.robust, data = data, :

no probe in common -> annot or mapping parameters are necessary for the mapping process!

In the command, annot is the file used for annotation and is of the format -

probe      EntrezGene.ID      Gene.ID      Gene.Symbol

Data refers to the input file, which is of the format -

Gene.Symbol      Sample1      SAMPLE2    ...    Sample 1092

Both files are tab delimited. I want to know if anyone has done this before, and if the file formatting is correct?

P.S. I have tried using the Gene.Symbol and probe in the data file, but both give the same error.

Edit: Should my data file also contain the EntrezGene.ID column?

Thank you.

RNA-Seq R bioconductor genefu pam50 • 6.0k views
ADD COMMENT
0
Entering edit mode

Hi vinayjrao,

Try using do.mapping=TRUE in the molecular.subtyping() function.

Matina

ADD REPLY
0
Entering edit mode

Hi Matina,

Thank you for the suggestion, but do.mapping=T gave me the following error Error in data1[, gg.uniq, drop = FALSE] : subscript out of bounds. Could you also explain how does do.mapping affect the run?

Thanks.

ADD REPLY
1
Entering edit mode

Hi vinayjrao,

I think the problem is with the data matrix; molecular.subtyping fuction expects a matrix of samples(rows) x genes(cols). As I can see above your data matrix is genes x samples, right? Try transposing the matrix. Let me know if it solved your problem!

From the genefu vignette for do.mapping: TRUE if the mapping through Entrez Gene ids must be performed (in case of ambiguities, the most variant probe is kept for each gene)

Matina

ADD REPLY
0
Entering edit mode

Dear Matina,

I tried transposing the data matrix. It still gave me the same error as the first time. Would it be helpful if I shared the script with you?

Thanks.

Edit: I was looking into the column names, and there was an error on my part. Transposing the data helped. Thank you very much for all the help :)

ADD REPLY
0
Entering edit mode

Great! Happy to help! I'll post it as an answer then.

ADD REPLY
0
Entering edit mode

@vinayjrao I am having the same error. Can you elaborate on how transposing the data solve your problem? Thanks

ADD REPLY
0
Entering edit mode

In my case, I encountered the same error message when column names were "Entrez gene ids". When I convert column names into "gene symbols" molecular.subtyping function worked.

ADD REPLY
0
Entering edit mode

In my case, the problem was resolved when I converted column names into gene symbols (in the transposed matrix). Hope that helps some of you who encounter the same error message after transposing the matrix.

ADD REPLY
0
Entering edit mode

Hi vinayjrao,

I have exactly the same problems as you and I do not manage to resolve it by transforming the data. Can you please share the annotation file with me? Don't understand what I am doing wrong.

Best, Linnea

ADD REPLY
1
Entering edit mode
6.4 years ago
Matina ▴ 250

Hi vinayjrao,

I think the problem is with the data matrix; molecular.subtyping fuction expects a matrix of samples(rows) x genes(cols). As I can see above your data matrix is genes x samples, right? Try transposing the matrix.

From the genefu vignette for do.mapping: TRUE if the mapping through Entrez Gene ids must be performed (in case of ambiguities, the most variant probe is kept for each gene)

ADD COMMENT

Login before adding your answer.

Traffic: 1389 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6