Question

Download data from the TCGA for gene expression analyses in R

0

Entering edit mode

3.7 years ago

Lucy ▴ 10

I'm trying to download data from the TCGA for gene expression analyses in R, but I'm in doubt if I should use FPKM, FPKM-UQ or counts? When the dataset is in counts, I suppose it's raw data, isn't it? So what's the best unit to compare multiple datasets? I'm planning to use limma or Dseq2 for GE analysis and found that with Dseq2 I need to use count(non-normalised???) data... is that correct? so what's the best package and working strategy?

RNA-Seq • 1.9k views

ADD COMMENT • link 3.6 years ago by Lucy ▴ 10

0

Entering edit mode

Please I need help again. I need to know which column in my table corresponds to the FPKM-UQ values. Thank you for your help!

seqnames        start             end                       width         strand       ensembl_gene_id       external_gene_name      original_ensembl_gene_id
chrX         100627108         10063999             12884            -           ENSG00000000003            TSPAN6                   ENSG00000000003.13

ADD REPLY • link updated 3.6 years ago by GenoMax 142k • written 3.6 years ago by Lucy ▴ 10

1

Entering edit mode

No value in that output relates to FPKM-UQ

ADD REPLY • link 3.6 years ago by Kevin Blighe 87k

0

Entering edit mode

I need a table with only FPKM-UQ and Genes values. How to identify the FPKM-UQ values in an S4 matrix? thank you

ADD REPLY • link 3.6 years ago by Lucy ▴ 10

1

Entering edit mode

Sorry, I have no information about which data you have or what you are aiming to do.

ADD REPLY • link 3.6 years ago by Kevin Blighe 87k

0

Entering edit mode

Cancer data. I intend to compare the values of FPKM-UQ expressed in normal tissue and primary tumor.

ADD REPLY • link 3.6 years ago by Lucy ▴ 10

1

Entering edit mode

I see, but what data have you retrieved right now? If you are relatively new to programming, I may suggest using TCGAbiolinks in R / Bioconductor. If you have no programming experience, then perhaps use cBioPortal

ADD REPLY • link 3.6 years ago by Kevin Blighe 87k

0

Entering edit mode

I used TCGAbiolinks. Yes, I am a beginner in programming. I used this code

query <- GDCquery(project = "TCGA-BRCA",
                  data.category = "Transcriptome Profiling",
                  data.type = "Gene Expression Quantification", 
                  workflow.type = "HTSeq - FPKM-UQ")

ADD REPLY • link updated 3.6 years ago by Kevin Blighe 87k • written 3.6 years ago by Lucy ▴ 10

1

Entering edit mode

Sure thing, be sure, therefore, to follow the extensive tutorials: https://bioconductor.org/packages/release/bioc/html/TCGAbiolinks.html

For now, you will want 3. Downloading and preparing files for analysis

ADD REPLY • link 3.6 years ago by Kevin Blighe 87k

score 2 · Answer 1 · 2020-08-31

2

Entering edit mode

3.7 years ago

Kevin Blighe 87k

If you plan to use limma / voom or DESeq2, then the best would be to obtain the raw counts and then follow the guidance for these programs.

You cannot, in any easy fashion, take FPKM expression units and re-process them using either of these programs.

Kevin

ADD COMMENT • link 3.7 years ago by Kevin Blighe 87k

1

Entering edit mode

Thank you very much

ADD REPLY • link 3.7 years ago by Lucy ▴ 10