Question

normalization two different datasets tcga vs gtex

0

Entering edit mode

3.2 years ago

Taktak31 • 0

using tcga and gtex to look for lncrna DE (using raw files) - what are the best ways to normalize? deseq2 and edger? also if i want to look at lncrnas of specific chromosomes, how should i approach normalization?

tcga gtex normalization • 886 views

ADD COMMENT • link updated 3.1 years ago by i.sudbery 20k • written 3.2 years ago by Taktak31 • 0

score 0 · Answer 1 · 2021-08-13

The real answer to this is that you can't. TCGA and GTEx are different experiments and they aren't really comparable in a rigorous way on a quantitative level. At best you might find leads that would need extensive lab validation. No reviewer who knows what they are doing will accept a paper based solely on a DE comparison between TCGA and GTEx.

If you were insistant on doing this then I can see two appraoches:

Most TCGA datasets have at least some normal controls. Set up a design table with two columns: Disease (fill with Cancer and normal) and Dataset (fill with TCGA and GTEx). Set up your design formula to be '~ Dataset + Disease'. EdgeR or DESeq2 doesn't matter, they are more or less the same. Note this will only work if your TCGA data has at least some normals in it.
Transform the data with limma voom. Then apply limma's quantile betweenArrayNormalization. This is a super harsh normalization.

But just to reiterate - I wouldn't trust the results that came out of either of these. I'd treat this as lead prioritisation only.