HT-Seq count data
0
0
Entering edit mode
4.4 years ago
Rob ▴ 170

Hi friends How should I split my data of 500 patients with RNA seq data into 2/3 training and 1/3 validation sets randomly? I tried to select randomly in excel, but result gives repeated patients in my sets. How can I use randomly without having duplicate patients?

RNA-Seq • 915 views
ADD COMMENT
3
Entering edit mode

Excel is not a good tool for advanced statistics. Please use ML libraries in R/python to split data into training and test sets. A little bit of google will show you pre-existing functions in scikit-learn and R that can split your data without you having to manually do much work.

ADD REPLY
1
Entering edit mode

If it is only about the splitting, in R you can randomly generate numbers, here 167 random numbers between 1 and 500 withoout duplicates:

> sample(seq(1,500), round(500*(1/3)), replace = FALSE)
  [1] 317 335  60 479 136  16  12 366 303 325 245  78 478 307 127 425 500 469 360 446 130 257 463 419  35 198  99
 [28] 170 113 102 364 165 302 294 215 481 367 129 449  90  73 251 296 137 347 409 394 187  10  39 106 428 281 447
 [55] 451 298 101 125 395 224 291 402 228 464 167 162 240 359  32  43 435 169 321 339  66 380 260  48 311 377 285
 [82] 135 470 404 107 178 158 429 152 221 495  79 386 286  36 255 183  71 383 494  21 230 319 476 490 145 493 387
[109] 314  41 416  63 100 310 141 406 334 121  85   3 272 282  87 427 287  40  94 212 206  53 412 258 229 144 370
[136] 358 203 234  30 168 332 309 156 241  15 437 163  64 474 242 181 398  17 442 210 346 443 320 188 403 108  31
[163]  56  27  11 460 329

As _r_am suggests, please get familiar with proper programming languages. I guarantee you that you do not want to load the expression profile of 500 patients into Excel.

ADD REPLY
0
Entering edit mode

Thanks ATpoin This worked great

ADD REPLY

Login before adding your answer.

Traffic: 1773 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6