Question: Download data from the TCGA for gene expression analyses in R
0
gravatar for Lucy
7 weeks ago by
Lucy10
Lucy10 wrote:

I'm trying to download data from the TCGA for gene expression analyses in R, but I'm in doubt if I should use FPKM, FPKM-UQ or counts? When the dataset is in counts, I suppose it's raw data, isn't it? So what's the best unit to compare multiple datasets? I'm planning to use limma or Dseq2 for GE analysis and found that with Dseq2 I need to use count(non-normalised???) data... is that correct? so what's the best package and working strategy?

rna-seq • 161 views
ADD COMMENTlink modified 4 weeks ago • written 7 weeks ago by Lucy10

Please I need help again. I need to know which column in my table corresponds to the FPKM-UQ values. Thank you for your help!

seqnames        start             end                       width         strand       ensembl_gene_id       external_gene_name      original_ensembl_gene_id
chrX         100627108         10063999             12884            -           ENSG00000000003            TSPAN6                   ENSG00000000003.13
ADD REPLYlink modified 4 weeks ago by genomax91k • written 4 weeks ago by Lucy10
1

No value in that output relates to FPKM-UQ

ADD REPLYlink written 4 weeks ago by Kevin Blighe66k

I need a table with only FPKM-UQ and Genes values. How to identify the FPKM-UQ values ​​in an S4 matrix? thank you

ADD REPLYlink written 4 weeks ago by Lucy10
1

Sorry, I have no information about which data you have or what you are aiming to do.

ADD REPLYlink written 4 weeks ago by Kevin Blighe66k

Cancer data. I intend to compare the values ​​of FPKM-UQ expressed in normal tissue and primary tumor.

ADD REPLYlink written 4 weeks ago by Lucy10
1

I see, but what data have you retrieved right now? If you are relatively new to programming, I may suggest using TCGAbiolinks in R / Bioconductor. If you have no programming experience, then perhaps use cBioPortal

ADD REPLYlink written 4 weeks ago by Kevin Blighe66k

I used TCGAbiolinks. Yes, I am a beginner in programming. I used this code

query <- GDCquery(project = "TCGA-BRCA",
                  data.category = "Transcriptome Profiling",
                  data.type = "Gene Expression Quantification", 
                  workflow.type = "HTSeq - FPKM-UQ")
ADD REPLYlink modified 4 weeks ago by Kevin Blighe66k • written 4 weeks ago by Lucy10
1

Sure thing, be sure, therefore, to follow the extensive tutorials: https://bioconductor.org/packages/release/bioc/html/TCGAbiolinks.html

For now, you will want 3. Downloading and preparing files for analysis

ADD REPLYlink written 4 weeks ago by Kevin Blighe66k
2
gravatar for Kevin Blighe
7 weeks ago by
Kevin Blighe66k
Kevin Blighe66k wrote:

If you plan to use limma / voom or DESeq2, then the best would be to obtain the raw counts and then follow the guidance for these programs.

You cannot, in any easy fashion, take FPKM expression units and re-process them using either of these programs.

Kevin

ADD COMMENTlink written 7 weeks ago by Kevin Blighe66k
1

Thank you very much

ADD REPLYlink written 7 weeks ago by Lucy10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1406 users visited in the last hour