Question: can i used RNA seq normalized data of PAAD, downloaded from TCGA with the help of TCGA assember.
1
gravatar for rajesh
6 months ago by
rajesh 60
India/Chandigarh/
rajesh 60 wrote:

Dear R people,

I downloaded the normalized RNA seq data of Pancreatic adenocarcinoma with the help of TCGA assember. The data looks like this

GeneSymbol  EntrezID    TCGA-2J-AAB1-01A-11R-A41B-07    TCGA-2J-AAB4-01A-12R-A41B-07    TCGA-2J-AAB6-01A-11R-A41B-07    TCGA-2J-AAB8-01A-12R-A41B-07    TCGA-2J-AAB9-01A-11R-A41B-07    TCGA-2J-AABA-01A-21R-A41B-07
A1BG    1   81.9122 56.7551 82.5497 56.9307 105.7878    99.3455
A1CF    29974   25.3659 53.4512 8.1871  33.8425 21.4362 18.7882
RBFOX1  54715   0.4878  2.1044  0   0   1.0718  0
GGACT   87769   180.4976    111.0774    163.1228    185.8143    166.7095    99.2767
A2ML1   144568  85.8537 0   1815.7895   16.9213 642.015 873.6496
A2M 2   19703.8049  15837.8241  8517.4444   14413.913   24311.7792  10302.0072
A4GALT  53947   1541.4634   1154.8822   1121.0526   392.9495    1125.4019   633.1611

So my question is,

  1. Is it normalized data, because I am confusing with the values because they are very high.
  2. If not normalized, then how to normalize it.
  3. Any best package for the same.

Thanks in advance.

rna-seq R • 200 views
ADD COMMENTlink modified 6 months ago by RamRS22k • written 6 months ago by rajesh 60

I am attaching the figure for the data.

GeneSymbol  EntrezID    TCGA-2J-AAB1-01A-11R-A41B-07    TCGA-2J-AAB4-01A-12R-A41B-07    TCGA-2J-AAB6-01A-11R-A41B-07    TCGA-2J-AAB8-01A-12R-A41B-07    TCGA-2J-AAB9-01A-11R-A41B-07    TCGA-2J-AABA-01A-21R-A41B-07
A1BG    1   81.9122 56.7551 82.5497 56.9307 105.7878    99.3455
A1CF    29974   25.3659 53.4512 8.1871  33.8425 21.4362 18.7882
RBFOX1  54715   0.4878  2.1044  0   0   1.0718  0
GGACT   87769   180.4976    111.0774    163.1228    185.8143    166.7095    99.2767
A2ML1   144568  85.8537 0   1815.7895   16.9213 642.015 873.6496
A2M 2   19703.8049  15837.8241  8517.4444   14413.913   24311.7792  10302.0072
A4GALT  53947   1541.4634   1154.8822   1121.0526   392.9495    1125.4019   633.1611
ADD REPLYlink modified 6 months ago by genomax69k • written 6 months ago by rajesh 60

Very difficult for us to know by just looking at a tiny snapshot of your data. You can most likely answer your own question by reading the TCGA Assembler manual / quick start guide, and / or simply looking at the options / parameters for the function that you use to retrieve this gene expression data.

Ultimately, you should have exhaustively tried to answer your own question before coming here.

ADD REPLYlink written 6 months ago by Kevin Blighe45k

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLYlink written 6 months ago by genomax69k
0
gravatar for Chirag Parsania
6 months ago by
Chirag Parsania1.4k
University of Macau
Chirag Parsania1.4k wrote:
  1. Is it normalized data ? You already mentioned in the question that ...

I downloaded the normalized RNA seq data of Pancreatic adenocarcinoma with the help of TCGA assember.

Anyway, To me it seems normalised data. Typical FPKM/RPKM/TPM values looks like this (in the range of thousand). Also, your values are in decimals suggesting they are not raw read counts but normalised by some methods.

Cufflink, DEseq, edgeR, limma, ballgown are some of the well known package form RNA-seq normalisation.

ADD COMMENTlink modified 6 months ago • written 6 months ago by Chirag Parsania1.4k
1

Also, your values are in decimals suggesting they are not raw read counts but normalised by some methods.

Could be estimated counts, like from, e.g., RSEM, Kallisto, or Salmon.

ADD REPLYlink written 6 months ago by Kevin Blighe45k

Thanks for your response. This seems to be normalized but at that time I am worried about large values. But when I read about TCGA assembler, manual they have used percentile method for normalizing the data.

ADD REPLYlink written 6 months ago by rajesh 60

Can you please quote the source of that? Also, what exactly are you aiming to do with the data? There are likely easier ways than using TCGA Assembler.

ADD REPLYlink written 6 months ago by Kevin Blighe45k

I want to do differential gene expression analysis of PAAD data, for that i downloaded the TCGA normalized PAAD data.

ADD REPLYlink written 6 months ago by rajesh 60

Well, this: 'percentile method for normalizing the data' is not a typical method for normalising data. Can you please plot a histogram of your data and add it as a figure (in R, use hist()). Please go HERE to upload a screenshot and obtain a link for the figure.

You do not need to worry about the large values - that is normal in RNA-seq.

ADD REPLYlink modified 6 months ago • written 6 months ago by Kevin Blighe45k

Well, thanks for your comment.

ADD REPLYlink written 6 months ago by rajesh 60

This seems a nice explanation.

ADD REPLYlink written 6 months ago by rajesh 60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 856 users visited in the last hour