Question: HT-Seq count data
0
gravatar for Rob
14 days ago by
Rob30
Rob30 wrote:

Hi friends How should I split my data of 500 patients with RNA seq data into 2/3 training and 1/3 validation sets randomly? I tried to select randomly in excel, but result gives repeated patients in my sets. How can I use randomly without having duplicate patients?

rna-seq • 92 views
ADD COMMENTlink written 14 days ago by Rob30
3

Excel is not a good tool for advanced statistics. Please use ML libraries in R/python to split data into training and test sets. A little bit of google will show you pre-existing functions in scikit-learn and R that can split your data without you having to manually do much work.

ADD REPLYlink written 14 days ago by _r_am32k
1

If it is only about the splitting, in R you can randomly generate numbers, here 167 random numbers between 1 and 500 withoout duplicates:

> sample(seq(1,500), round(500*(1/3)), replace = FALSE)
  [1] 317 335  60 479 136  16  12 366 303 325 245  78 478 307 127 425 500 469 360 446 130 257 463 419  35 198  99
 [28] 170 113 102 364 165 302 294 215 481 367 129 449  90  73 251 296 137 347 409 394 187  10  39 106 428 281 447
 [55] 451 298 101 125 395 224 291 402 228 464 167 162 240 359  32  43 435 169 321 339  66 380 260  48 311 377 285
 [82] 135 470 404 107 178 158 429 152 221 495  79 386 286  36 255 183  71 383 494  21 230 319 476 490 145 493 387
[109] 314  41 416  63 100 310 141 406 334 121  85   3 272 282  87 427 287  40  94 212 206  53 412 258 229 144 370
[136] 358 203 234  30 168 332 309 156 241  15 437 163  64 474 242 181 398  17 442 210 346 443 320 188 403 108  31
[163]  56  27  11 460 329

As _r_am suggests, please get familiar with proper programming languages. I guarantee you that you do not want to load the expression profile of 500 patients into Excel.

ADD REPLYlink modified 14 days ago • written 14 days ago by ATpoint44k

Thanks ATpoin This worked great

ADD REPLYlink written 9 days ago by Rob30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1618 users visited in the last hour
_