I try to use expression array data (not RNA-sequencing data) to predict ethnicity. Is it possible and any good methods? Many thanks!
Why did you use the RNA-seq tag for your post if your question is about microarray data?
Sorry for that. I didn't find the microarray tag.
I've changed your tags to better reflect your question.
It is possible. Any multi-class classification algorithm would be applicable, such as multinomial logistic regression or random forest.
Awesome! I will try those algorithms. Thanks for your reply and have a good day.
Just be careful about the interpretation, as ethnicities are not regarded as having differences at the gene expression level. I feel that you will likely find differences in mitochondrial genes and other genes that are mostly unstudied, like non-coding RNAs. The availability of these transcripts will depend on the array type that you're using, of course.
Ethnic differences at the genetic level, however, are well documented and need no introduction.
Method is not the key problem. The main problem is that it is hard to find the data to train the model. Another problem is I am thinking it will be quite hard to distinguish ethnicity with expression array data.
That was my first thought, too, upon reading this post. Different ethnicities are not regarded as differing based on gene expression. You may find ethnic-specific expression differences in a disease situation, though