Question: Possible methodology-R package for simulating a microarray dataset with both gene and clinical continuous features
gravatar for svlachavas
2.2 years ago by
svlachavas560 wrote:

Dear Community,

through R and based on a microarray gene expression dataset (60 samples in total-30 cancer and 30 control samples) and R package caret, i have performed a feature selection regarding a binary categorical outcome (Disease status). My final selected subset, is comprised of both gene features as also clinical continuous variables (the initial dataset, was produced by merging and batch effect corrected 2 affymetrix microarray datasets with similar phenotype condition, and also paired-each patient has both cancer and control samples).

Moreover, except from a simple initial inspection of my combined composite feature set with cross-validation, i would like also somehow to perform an initial validation in an independent dataset, in the way of testing the classifier trained in my initial dataset with these features. The major problem of simply selecting a microarray dataset from GEO and/or other repositories, is that these PET features, have been only measured in the same patients that also the microarrays have been produced (an important novelty that i would like somehow to test).So, i could not have any external samples or datasets with these clinical features.

Thus, there any package or methodology that i could implement in R, in order to perform a possible simulation of my above dataset with only these 41 features, and then utilize this "synthetic" dataset for external validation with the classifier constructed in my initial training/analyzed dataset ?

ADD COMMENTlink modified 2.1 years ago by Biostar ♦♦ 20 • written 2.2 years ago by svlachavas560
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1119 users visited in the last hour