Download data from the TCGA for gene expression analyses in R
1
0
Entering edit mode
3.7 years ago
Lucy ▴ 10

I'm trying to download data from the TCGA for gene expression analyses in R, but I'm in doubt if I should use FPKM, FPKM-UQ or counts? When the dataset is in counts, I suppose it's raw data, isn't it? So what's the best unit to compare multiple datasets? I'm planning to use limma or Dseq2 for GE analysis and found that with Dseq2 I need to use count(non-normalised???) data... is that correct? so what's the best package and working strategy?

RNA-Seq • 1.9k views
ADD COMMENT
0
Entering edit mode

Please I need help again. I need to know which column in my table corresponds to the FPKM-UQ values. Thank you for your help!

seqnames        start             end                       width         strand       ensembl_gene_id       external_gene_name      original_ensembl_gene_id
chrX         100627108         10063999             12884            -           ENSG00000000003            TSPAN6                   ENSG00000000003.13
ADD REPLY
1
Entering edit mode

No value in that output relates to FPKM-UQ

ADD REPLY
0
Entering edit mode

I need a table with only FPKM-UQ and Genes values. How to identify the FPKM-UQ values ​​in an S4 matrix? thank you

ADD REPLY
1
Entering edit mode

Sorry, I have no information about which data you have or what you are aiming to do.

ADD REPLY
0
Entering edit mode

Cancer data. I intend to compare the values ​​of FPKM-UQ expressed in normal tissue and primary tumor.

ADD REPLY
1
Entering edit mode

I see, but what data have you retrieved right now? If you are relatively new to programming, I may suggest using TCGAbiolinks in R / Bioconductor. If you have no programming experience, then perhaps use cBioPortal

ADD REPLY
0
Entering edit mode

I used TCGAbiolinks. Yes, I am a beginner in programming. I used this code

query <- GDCquery(project = "TCGA-BRCA",
                  data.category = "Transcriptome Profiling",
                  data.type = "Gene Expression Quantification", 
                  workflow.type = "HTSeq - FPKM-UQ")
ADD REPLY
1
Entering edit mode

Sure thing, be sure, therefore, to follow the extensive tutorials: https://bioconductor.org/packages/release/bioc/html/TCGAbiolinks.html

For now, you will want 3. Downloading and preparing files for analysis

ADD REPLY
2
Entering edit mode
3.7 years ago

If you plan to use limma / voom or DESeq2, then the best would be to obtain the raw counts and then follow the guidance for these programs.

You cannot, in any easy fashion, take FPKM expression units and re-process them using either of these programs.

Kevin

ADD COMMENT
1
Entering edit mode

Thank you very much

ADD REPLY

Login before adding your answer.

Traffic: 2721 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6