how to validate a list of genes?
1
0
Entering edit mode
5.9 years ago
Chaimaa ▴ 260

Hello guys, how to validate a list of genes identified from a regularization method.??

I have applied a regularization method to identify a list of significant genes, but i want to know how to validate these identified lists of genes.

Note that these genes have enriched in several significant pathways in David. I really appreciate any help !

genes • 1.7k views
ADD COMMENT
3
Entering edit mode
5.9 years ago

You could run your regularization method with background or "random" data, to get a list of genes that are called significant from background data (whatever that is, in your case).

Once you have that, you could use a hypergeometric distribution or Fisher's exact test to compare counts of significant data from observed and background data sets.

Testing (with the correct test, depending on what you use to determine significance) gives you a p-value that you can threshold for significance/insignificance.

ADD COMMENT
0
Entering edit mode

@Alex Reynolds Thanks for your response.

I already get my significant genes based on that method and a cutoff metric.

Now, i want to validate these genes and my question how to validate these genes?

ADD REPLY
0
Entering edit mode

Anyone else can help me to get a response about this question plz???

ADD REPLY
0
Entering edit mode

@Kevin Blighe Sorry for that but my main question is that i got a list of genes by applying an elastic net method and then i want to validate these genes. how to do that?

ADD REPLY
2
Entering edit mode

With the output from the elastic net regression, you can just build a new model (glm or lm) with your final list and then check it's ability to predict the end-point via r-squared shrinkage and ROC analysis. There are many other metrics that can be applied. I go over some of this in the posts mentioned above.

ADD REPLY
0
Entering edit mode

@ Kevin Blighe wht do you mean by end point plz?

ADD REPLY
2
Entering edit mode

By end-point, I mean the dependent (y) variable. For example, in this formula, glm(condition ~ gene1 + covariate), condition is the end-point.

ADD REPLY
0
Entering edit mode

@Kevin Blighe, i read all the materials you recommended but i still can(t get a response of my question how to validate a list of genes Suppose, i applied the elastic net method on my specific data and get as output 50 genes and so how to validate these 50 genes ?

ADD REPLY
3
Entering edit mode

Chief, take a look at what I do here: A: How to exclude some of breast cancer subtypes just by looking at gene expressio

When you applied the elastic net regression, I presume that you cross-validated it at the same time using cv.glmnet?

Essentially, with your 50 genes, build a 'final' model via glm() or lm() with all of your genes and then test that via ROC analysis. You can also derive sensitivity / specificity / precision / accuracy:

Assume that your final model is called model, and your data is data:

require(ROCR)
pred <- prediction(predict(model, type="response"), data$Condition)
ss <- performance(pred, "sens", "spec")

print("Predicted probability cut-off:")
ss@alpha.values[[1]][which.max(ss@x.values[[1]]+ss@y.values[[1]])]
predicted.prob <- round(ss@alpha.values[[1]][which.max(ss@x.values[[1]]+ss@y.values[[1]])], 2)

Sensitivity / Specificity

print("Sensitivity and specificity, weighed equally, are:")
max(ss@x.values[[1]]+ss@y.values[[1]])/2
sens.spec <- round(max(ss@x.values[[1]]+ss@y.values[[1]])/2, 2)

Determine cost function

cost.perf <- performance(pred, measure="cost")
cost <- pred@cutoffs[[1]][which.min(cost.perf@y.values[[1]])]

Precision

prec.perf <- performance(pred, measure="prec")
ind <- which.max(slot(prec.perf, "y.values")[[1]])
prec <- slot(prec.perf, "y.values")[[1]][ind]
prec.cutoff <- slot(prec.perf, "x.values")[[1]][ind]
print(c(precision=prec, cutoff=prec.cutoff))

Accuracy

acc.perf <- performance(pred, measure="acc")
ind <- which.max(slot(acc.perf, "y.values")[[1]])
acc <- slot(acc.perf, "y.values")[[1]][ind]
acc.cutoff <- slot(acc.perf, "x.values")[[1]][ind]
print(c(accuracy=acc, cutoff=acc.cutoff))

Note that 50 genes may be too many for a final model. You may consider reducing this number via stepwise regression (see lecture notes 3, here: https://github.com/kevinblighe/Rtutorials ) or by choosing a higher threshold from elastic net regression.

ADD REPLY

Login before adding your answer.

Traffic: 2327 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6