Entering edit mode
2.9 years ago
465336766
•
0
Recently I am analyzing a set of RNA-seq data and I was asked question that how to verify the authenticity of the data set downloaded from database? Is the data you get true? Do these data match their labels?
Can you elaborate? What do you meaen with authenticity? Like md5sums? What is "true"?
I mean, for example, when you download data from the database that includes the treatment group and the control group, how do you know that it's really the treatment group and the control group and not just switching identities or other something? Thank u
You don't. If there are biological markers e.g. a cancer vs normal and cancer is known to express certain genes and normal do not, then you can check for that, but metadata are usually what the authors provide you (or not). A proper analysis always includes some QC, e.g. PCA for RNA-seq to see whether the clustering indicates switch of labels, e.g. some normals clustering with the cancers and vice versa. This combined with individual gene expression checks might then tell you that something is odd and you could contact the authors (in case they respond). But this is all very custom, I doubt there is a simple automated procedure for things like that.
Thank u so much, Sir. Do I know what I probaly need to do.