Question: Apply multidimensional models with Low number of samples? (What should i do with the train-test?)
0
gravatar for galapacheco
5 months ago by
galapacheco0 wrote:

I've been working with huge datasets since, well, i started with bioinformatics, but now i face a problem with a new dataset with very little samples.

I have 5 groups, Control, and 4 Diseases, wich frecuencies vary from a set of 10 features corresponding to the $log_2(1+2^{-Delta CT})$ values of gene expression (I had to use a pseudocount, to "nullify" my 0, preventing them to become a NA or -Inf).

Yet i only have a maximum of 20 entries per group (and a minimum of 3, because the data is full of NA's in some features). My plan is to cross use some of this features with clinical values in order to fit a model; i have a complete dataset of 87 of them with very few little Na's.

But I'm stuck with:

a) How do i divide a train-test dataset to fit my models with this very few little data?
b) How can i do the feature selection with my 8 firsts gene-features? I did some ANOVA (despise they are not normal, and the dataset its full of extreme outliers detected by 'identify_outliers()' and easily visible by boxplots; and some manovas with very few little features that are "significant" (Despise the data dont fullfill the asumptions, like normality).
c) Should i use a multinomial logistic regression? By the rule of thumb, i need about 10 samples per feature, but i dont know any more multiclass models that assign a probability.

Any recommendations?

ADD COMMENTlink modified 5 months ago by zx87549.7k • written 5 months ago by galapacheco0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1006 users visited in the last hour