ICGC exp_array data double measurements?
0
0
Entering edit mode
5.0 years ago
pulyakhina ▴ 20

Hi everyone,

I'm working with publicly available ICGC expression data, namely file "exp_array.ALL-US.tsv".

I noticed that a few genes seem to have double measurements: for the same gene (e.g., NM_015092), I see two lines which are exactly the same (same donor ID, same sample ID, same analysis ID, etc), and the only difference is the normalized expression value:

DO2 ALL-US  SP2 SA4 ... ... RefSeq  NM_015092   2275.018 ...
DO2 ALL-US  SP2 SA4 ... ... RefSeq  NM_015092   1587.806 ...


When I save each of these two lines as a separate file, remove the normalized expression value and check for diff, I see that there is no difference, so the rest of the lines truly is the same.

Does anyone know why this happened, and if I wish to use the data, which value (or which combination of values) should I use? Thank you in advance!

Kind regards,

Irina

ICGC expression duplicates • 1.1k views