Question

can i used RNA seq normalized data of PAAD, downloaded from TCGA with the help of TCGA assember.

1

Entering edit mode

5.3 years ago

rajesh ▴ 60

Dear R people,

I downloaded the normalized RNA seq data of Pancreatic adenocarcinoma with the help of TCGA assember. The data looks like this

GeneSymbol  EntrezID    TCGA-2J-AAB1-01A-11R-A41B-07    TCGA-2J-AAB4-01A-12R-A41B-07    TCGA-2J-AAB6-01A-11R-A41B-07    TCGA-2J-AAB8-01A-12R-A41B-07    TCGA-2J-AAB9-01A-11R-A41B-07    TCGA-2J-AABA-01A-21R-A41B-07
A1BG    1   81.9122 56.7551 82.5497 56.9307 105.7878    99.3455
A1CF    29974   25.3659 53.4512 8.1871  33.8425 21.4362 18.7882
RBFOX1  54715   0.4878  2.1044  0   0   1.0718  0
GGACT   87769   180.4976    111.0774    163.1228    185.8143    166.7095    99.2767
A2ML1   144568  85.8537 0   1815.7895   16.9213 642.015 873.6496
A2M 2   19703.8049  15837.8241  8517.4444   14413.913   24311.7792  10302.0072
A4GALT  53947   1541.4634   1154.8822   1121.0526   392.9495    1125.4019   633.1611

So my question is,

Is it normalized data, because I am confusing with the values because they are very high.
If not normalized, then how to normalize it.
Any best package for the same.

Thanks in advance.

RNA-Seq R • 1.6k views

ADD COMMENT • link updated 5.3 years ago by Ram 43k • written 5.3 years ago by rajesh ▴ 60

0

Entering edit mode

I am attaching the figure for the data.

GeneSymbol  EntrezID    TCGA-2J-AAB1-01A-11R-A41B-07    TCGA-2J-AAB4-01A-12R-A41B-07    TCGA-2J-AAB6-01A-11R-A41B-07    TCGA-2J-AAB8-01A-12R-A41B-07    TCGA-2J-AAB9-01A-11R-A41B-07    TCGA-2J-AABA-01A-21R-A41B-07
A1BG    1   81.9122 56.7551 82.5497 56.9307 105.7878    99.3455
A1CF    29974   25.3659 53.4512 8.1871  33.8425 21.4362 18.7882
RBFOX1  54715   0.4878  2.1044  0   0   1.0718  0
GGACT   87769   180.4976    111.0774    163.1228    185.8143    166.7095    99.2767
A2ML1   144568  85.8537 0   1815.7895   16.9213 642.015 873.6496
A2M 2   19703.8049  15837.8241  8517.4444   14413.913   24311.7792  10302.0072
A4GALT  53947   1541.4634   1154.8822   1121.0526   392.9495    1125.4019   633.1611

ADD REPLY • link updated 5.3 years ago by GenoMax 141k • written 5.3 years ago by rajesh ▴ 60

0

Entering edit mode

Very difficult for us to know by just looking at a tiny snapshot of your data. You can most likely answer your own question by reading the TCGA Assembler manual / quick start guide, and / or simply looking at the options / parameters for the function that you use to retrieve this gene expression data.

Ultimately, you should have exhaustively tried to answer your own question before coming here.

ADD REPLY • link 5.3 years ago by Kevin Blighe 87k

0

Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY • link 5.3 years ago by GenoMax 141k

score 0 · Answer 1 · 2019-01-14

0

Entering edit mode

5.3 years ago

Chirag Parsania ★ 2.0k

Is it normalized data ? You already mentioned in the question that ...

I downloaded the normalized RNA seq data of Pancreatic adenocarcinoma with the help of TCGA assember.

Anyway, To me it seems normalised data. Typical FPKM/RPKM/TPM values looks like this (in the range of thousand). Also, your values are in decimals suggesting they are not raw read counts but normalised by some methods.

Cufflink, DEseq, edgeR, limma, ballgown are some of the well known package form RNA-seq normalisation.

ADD COMMENT • link 5.3 years ago by Chirag Parsania ★ 2.0k

1

Entering edit mode

Also, your values are in decimals suggesting they are not raw read counts but normalised by some methods.

Could be estimated counts, like from, e.g., RSEM, Kallisto, or Salmon.

ADD REPLY • link 5.3 years ago by Kevin Blighe 87k

0

Entering edit mode

Thanks for your response. This seems to be normalized but at that time I am worried about large values. But when I read about TCGA assembler, manual they have used percentile method for normalizing the data.

ADD REPLY • link 5.3 years ago by rajesh ▴ 60

0

Entering edit mode

Can you please quote the source of that? Also, what exactly are you aiming to do with the data? There are likely easier ways than using TCGA Assembler.

ADD REPLY • link 5.3 years ago by Kevin Blighe 87k

0

Entering edit mode

I want to do differential gene expression analysis of PAAD data, for that i downloaded the TCGA normalized PAAD data.

ADD REPLY • link 5.3 years ago by rajesh ▴ 60

0

Entering edit mode

Well, this: 'percentile method for normalizing the data' is not a typical method for normalising data. Can you please plot a histogram of your data and add it as a figure (in R, use hist()). Please go HERE to upload a screenshot and obtain a link for the figure.