Entering edit mode
3.2 years ago
Rob
▴
170
Hi friends How should I split my data of 500 patients with RNA seq data into 2/3 training and 1/3 validation sets randomly? I tried to select randomly in excel, but result gives repeated patients in my sets. How can I use randomly without having duplicate patients?
Excel is not a good tool for advanced statistics. Please use ML libraries in R/python to split data into training and test sets. A little bit of google will show you pre-existing functions in scikit-learn and R that can split your data without you having to manually do much work.
If it is only about the splitting, in R you can randomly generate numbers, here 167 random numbers between 1 and 500 withoout duplicates:
As _r_am suggests, please get familiar with proper programming languages. I guarantee you that you do not want to load the expression profile of 500 patients into Excel.
Thanks ATpoin This worked great