Hello.
I have a bunch of migroarray data in an array and i wanna run the kNN algorithm on them. So for simplicity let's say that I have for 100 genes in 4 different treatments a table with expression levels.
| Gene name | Treat 1 | Treat 2 | Treat 3 | Treat 4|
----------------------------------------------------
| Gene 1 | 0.343 | 0.343 | 0.343 | 4.533 |
| Gene 2 | 0.353 | 1.343 | 0.443 | 0.343 |
| Gene 3 | 0.343 | 0.335 | 0.343 | 0.343 |
| ... | ... | ... | ... | ... |
| Gene 100 | 5.343 | 0.323 | 0.343 | 0.243 |
I will use the 70% for the training set and the 30% for the testing set.
train_set = data[1:70,]
test_set = data[71:100,]
I also have to create a vector with the labels of the training set.
train_labels = c("Treat 1", "Treat 2", "Treat 3", "Treat 4")
and then run the knn()
knn(train = train_set, test = test_set,cl = train_labels, k=10)
The think is that training labels are only 4 while the training set is consisted of 70 rows and I think that this is going to produce error.
Which is the right way to approach it ? Should I transpose my initial matrix ?
Thank you