Discriminant gene analysis
10 weeks ago

I want to obtain the top 5 discriminant genes (positive and negative direction) after a feature selection process. Is this the proper way to obtain the top 5 discriminant genes?

# New data.frame with genes that have passed both (Fold and rawp) tests
true.genes <- subset(gene.info, (Fold.Test & rawp.Test) == TRUE)
cat(sprintf("Total number of genes that pass both (rawp and Fold) tests: %s\n", nrow(true.genes))) # 5241

# Write these genes with their corresponding values to an output .txt file
write.table(true.genes, file="TrueGenes.csv", sep=",", col.names=NA, qmethod="double")

# Ordering the highest genes (by P-value) in the form of a data.frame
# Note: dat.filtered is still in log2 scale
best.genes    <- order(rawp.pass)[1:length(rawp.pass)]
best.genes.df <- data.frame(index=best.genes, rawp=rawp.pass[best.genes])
top.genes.matrix <- dat.filtered[best.genes, ]

# Feature Selection via svmRFE which utilizes the library e1071
t.dat  <- t(top.genes.matrix)
svm.df <-data.frame(label, t.dat)
ranked.list <- svmRFE(svm.df, k=10, halve.above=100)

# Write the rankings to an output .txt file so that it can be read in later if needed
output <- data.frame(RankedOrder = ranked.list)
write.table(output, file = "RankedList.txt")
top.ranked.genes <- top.genes.matrix[ranked.list, ]
rownames(top.ranked.genes) <- rownames(top.genes.matrix[ranked.list, ])

# Create a new genes.info data.frame for the ranked genes
top.genes.info <- gene.info[rownames(top.ranked.genes ),]
tg <- top.genes.info$pvalue[top.genes.info$pvalue < thresh]
top.genes.info <- top.genes.info[rownames(top.genes.info) %in% rownames(ann),]
top.genes.info <- top.genes.info[order(top.genes.info$pvalue),]
top5    <- head(top.genes.info, n=5L, na.omit=T)
bottom5 <- tail(top.genes.info, n=5L, na.omit=T)
selection genes discriminant machine feature learning r • 106 views

