I'm working with publicly available ICGC expression data, namely file "exp_array.ALL-US.tsv".
I noticed that a few genes seem to have double measurements: for the same gene (e.g., NM_015092), I see two lines which are exactly the same (same donor ID, same sample ID, same analysis ID, etc), and the only difference is the normalized expression value:
DO2 ALL-US SP2 SA4 ... ... RefSeq NM_015092 2275.018 ... DO2 ALL-US SP2 SA4 ... ... RefSeq NM_015092 1587.806 ...
When I save each of these two lines as a separate file, remove the normalized expression value and check for
diff, I see that there is no difference, so the rest of the lines truly is the same.
Does anyone know why this happened, and if I wish to use the data, which value (or which combination of values) should I use? Thank you in advance!