Entering edit mode

5.7 years ago

Chaimaa
▴
260

Hello guys,

I'm looking for a more powerful method than the classical method **elastic-net** that can select a more *significant list of genes.*

I already used the **elastic-net** method with alpha=0.5 to select a set of significant genes from a genomic data but now I'm looking for a more powerful method that can do the same thing, as this is suggested by one reviewer.
I read some papers but still can't be decided.

I appreciate any help!

Just be careful on the wording...

'

significant list of genes' should be 'statisticallysignificant list of genes'. Also, in which way are they statistically significant? - what is the null hypothesis?; how will you test the null hypothesis and derive a p value?How do you define '

more powerful method'?I would take the variables from the elastic-net model and then test them independently in a standard regression model. Ultimately, I would build a final predictor model of the best variables and test it via ROC analysis, like I do here: A: How to exclude some of breast cancer subtypes just by looking at gene expressio

@Kevin Blighe Yes, Kevin right i should say statistically significant and since i didn't use a p-value that would be less than 0.05 metric.

So, i can only say specific genes in my case. I first applied the elastic-net, then a cutoff of 20 to determine these specific genes but one of the reviewers suggest me to use another method, bcz elastic net is an old and classical method.

I mean by a More powerful method a little bit new and outperform method that can do the same issue.

Yeh, but, the latest methods (assuming s/he means AI, machine learning, etc) are invariably

notbetter than the 'classical' methods. I would take issue with the reviewer's comments. I have already given you a general workflow that you could try.@Kevin Blighe Kevin Bro, in my case I have 2 variables X (matrix of size m

n) and Y( vector of size m1) and those were my entries in the elastic net, and now i would like to find another method to do the same job .with these entries too.I check some papers and still can't decide.

@Kevin Blighe How if i can use elastic-net with glmnet pachakge is this can make sense but i don't have 'subtype' as you mention here'

I have a vector of vaues 0 and1 instead is it okk?"A: How to exclude some of breast cancer subtypes just by looking at gene expressio"Yes, you just need 0 or 1 for the outcome variable. Please take a look at all of the parameters for the functions in the

glmnetpackage@Kevin Blighe ,Hi Kevin Last question Plz, do you recommend any other methods than elastic-net and lasso ?

You could try Random Forest.

sure , Thanks Kevin!

@Kevin Blighe, I again have some other questions, and i hope you don't mind it! I performed my analysis using Matlab glmnet package, and now i turned to follow your process mentioned here "A: Multinomial elastic net implementation on microarray dataset" First of all, my data of 2 labels matrix X(

`219*25172`

and vector Y(`219*), 219 samples and 25172 genes and Y have only 2 values 0 or 1. I first try to open the X and y files into R Y opened properly but Not X using this command:But it has shown an object of size (

`218*25175`

) instead of (`219*25172`

)!!And on matab, i found around 303 genes, how can i use glm or lm in this case and with binomial or gaussian families in cvglmnet function with alpha=0.5 in my case?

what's the meaning of these 2things in your code ?

My data(X) looks like this

Hey, both methods should not be expected to produce the same results. The likely reason is that there is different filtering between the R and MATLAB versions.

`family=binomial(link="logit")`

- this instructs`glm()`

that the model is a binomial logistic regression`Terms=c(2:4)`

- this is used with the Wald test to produce a Wald p-value using 1 or more terms combined. For example:`Terms=c(2:4)`

will test`x1`

+`x2`

+`x3`

(2nd to 4th terms in the model, with the intercept being the 1st term) against the`y`

variable.@Kevin Blighe Yes Kevin that's why i turn to use yr R code now but I'm not unable to read my data into R plz check my line to open the file X<-fread("pathological_data.txt",sep="\t", stringsAsFactors=FALSE, header=TRUE)

In case i found also 300 genes by using R, can i still use glm() to pick the best predictors, bcz when we have more than 30 genes we have to test them separately and so how we can test 300 genes separately?

if we have 0 or 1 in Y label means its binomial right?

Thank for your valuable suggestions!

Did you look at the contents of X? - did you try to use

`read.table()`

instead of`fread()`

?Yes, 0 and 1 indicates binomial logistic regression.

With 300, the idea is to first use glmnet to reduce this to a lower number, and then use stepwise regression. You could also test each of the 300 gene's separately and choose only the genes that have p<0.05. Take a look at my function:, just released on Bioconductor: 3.1 Perform the most basic logistic regression analysis

@Kevin Blighe Sorry for the multiple questions Kevin I tried both

`read.table()`

and`fread()`

but both of them can(t fully read my large data. They only read the first 16384 columns. you know my data is too large and excel can show only 16384 columns instead of 25172 columns. So in MATLAB, i used`importdata`

but i don't know which command in R can read full data.?No, my original genes are 25172 genes and after glment usage i get 300 genes

I really want to try yr code is clear and complete and can make my biological interpretation more significant.

Sorry, didn't read the all the comments but

`fread`

should read just fine, see example with 30000 columns:@zx8754 Thanks zx8754, but could you plz explain these lines like the sample(1:10), 90000?

My data is of 219*25172 I'm not much familiar with R; Then i can try it.

I am just creating example data, similar to what you have. To show that

`fread`

canread files with 30000 columns.Please share your example data, so we can reproduce your problem.

@ zx8754 Hi here are some rows and columns from my data

To clarify, provide example data so that

`fread`

would fail to read.@zx8754 what do you mean by example data, I shared some rows and columns among the 219 rows and 25172 columns? i tried your code but doesn't work

If we can't

reproduce your problem, it is hard to guess the problem and the solutions. With your supplied example data`fread`

works fine, we need example data where`fread`

fails.So, extract these from your file and then input to R?

@Kevin Blighe But I want to try glmnet in R to extract genes from the 25172. Those 300 genes i got by using MATLAB.

Why do you want to repeat it in R?

bcz, I want to apply your next steps of glm and wald test which require as inputs X and Y. Those steps, i honestly don't know to do in MATLAB

Another option is to use R on a cluster, where the larger datatypes may be supported - these nuances are not my area of expertise.

Another option is to transpose the data in BASH / Shell, and then read the transposed data into R

@Kevin Blighe Great Kevin! Thanks a lot! One more thing I was working on a small project concerning cancer evolution with CNA and clinical data over 4 pathological stages and I'm planning to prepare a manuscript after finalizing the biological interpretations at each stage and i wonder if you can be with us as a co-author cz the majority of code sources for CNA was from your posts here ? Good Luck!

Sure thing. You can contact me from GitHub