10 months ago by
USA / Europe / Brazil
I would highly recommend the HTSeq counts, actually, because these will be raw counts. The FPKM method of normalisation has come under criticism in recent years and is now not even recommended by some sources. The main issue with FPKM normalisation is that cross-sample normalisation is non-existent, as such, it's akin to comparing multiple batches without even doing any correcting for batch.
Use HTSeq counts and load these into DESeq2 or EdgeR for downstream analyses.
I have recently analysed an entire TCGA RNAseq dataset (>500 samples) and I used HTSeq counts. They work very well.
Update May 2, 2018:
The TCGA states that "To facilitate cross-sample comparison and differential expression analysis, the GDC also provides Upper Quartile normalized FPKM (UQ-FPKM) values and raw mapping count." - https://gdc.cancer.gov/about-data/data-harmonization-and-generation/genomic-data-harmonization/high-level-data-generation/rna-seq-quantification
From my experience of the FPKM-UQ counts, a non-parametric t-test should be employed when comparing across samples. Fold-change calculations, however, do not appear to work that well on the FPKM-UQ scale.
My original advice still stands, i.e., better to obtain the raw HT-seq cunts (where available and re-process those using an updated normalisation method, like TMM (EdgeR) or geometric (DESeq2). Some TCAG datasets are only available in RSEM counts, which are also possible to use and input to DESeq2 using tximport