I am following the TCGAbiolinks tutorial for conducting differential expression analysis on TCGA data ("TCGAanalyze: Analyze data from TCGA" section). I have 2 questions about it.
1) I don't understand the following: when dealing with
legacy=TRUE data (
platform = "Illumina HiSeq", file.type = "results"), they perform normalization to correct gene length (
TCGAanalyze_Normalization with default parameter); but when they are dealing with
legacy=FALSE data (
workflow.type = "HTSeq - Counts"), they perform normalization to correct GC content (
method = "gcContent"). What is the reason for that ? Do you have any explanation ?
2) if I want to use the
TCGAanalyze_DEA function with
pipeline=limma, should I use the same normalization methods as for
pipeline=edgeR ? otherwise, which one should I use for the
legacy=TRUE data, respectively ?
Hope you could help a bit. Thanks in advance !