Question: What is the correct way of using GTEx, CCLE and HPA data?
gravatar for guto03
3.0 years ago by
guto030 wrote:

I asked this today on Research Gate, but I'm not receiving any response there... Maybe you folks can help me out.

I am a little disappointed here. After studying a lot of data from these sites, I simply can't find a way to compare the data between them.

My goal is to compare tumor data (cell line or patient samples) and normal tissue data for a particular protein. These sites provide rna-seq and immunohistochemistry (in the case of HPA) data.

From my research, I just can't compare their data for RPKM and FPKM are relative quantitation data and aren't comparable for the most of situations. In addition, CCLE uses RMA!! Man!!!

So, how people make use of the data in these databases? And can I somehow compare the data? Do you know an example of a paper that has done a research using normal x tumor data obtained from these databases and made a valid comparison?

I'm about to lose my mind. Why there isn't a single standard unit for RNA-seq? This is very confusing.

What is the correct way of using GTEx, CCLE and HPA data?

rna-seq R gene • 2.7k views
ADD COMMENTlink modified 3.0 years ago by Ar830 • written 3.0 years ago by guto030
gravatar for Ar
3.0 years ago by
United States
Ar830 wrote:

Why there isn't a single standard unit for RNA-seq ?
Answer to this question is because there are many hypothesis why a method is better than another and therefore, based on the hypothesis researchers prefer different units. It is the same reason why there is no single currency of the entire world.

Most of CCLE expression data is from microarray and therefore, it uses RMA for normalization of the dataset. RMA is not a unit, it is a normalization method. In case of microarray, we use log2(intensity value of the probe) whereas in case of RNA-Seq we mainly uses RPKM, counts, TPM.

What is the correct way of using GTEx, CCLE and HPA data?

I think one way to compare these datasets is by comparing Fold Changes and FDR of a gene between 2 different genotyes across GTEx, CCLE and HPA datasets and see whether they are differentially expressed or statistically significant. However, I think GTex does not have tumor samples therefore, I would recommend using TCGA datasets.

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Ar830

Thank's for the response. However, I think I didn't get the last point right. To my knowledge I can't compare rna-seq data from HPA with rna-seq data from CCLE... or with rna-seq data from TCGA... because they came from different labs and therefore should have different bias... Am I understanding it correctly? How do I escape from that?

ADD REPLYlink written 2.9 years ago by guto030

I think (and I could be wrong), only microarray samples from CCLE are available. You cannot compare Microarray data from CCLE with RNA-Seq data from TCGA because they have different statistical distributions and their units are not same. Moreover, if RNA-Seq data from CCLE is available then you can compare directly with TCGA datasets; however, you need to check PCA plots and see if the samples are clustering based on genotype or based on batches (where, a batch in this case is a consortium). If, it is clustering based on batches then you need to do batch correction.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Ar830

Now I get it! Thanks. Would you have any references on that so I could study this matter? As I'm new to computational biology, my knowledge on clustering is still shallow. I have never heard about PCA plots before... Thank you again!

ADD REPLYlink written 2.9 years ago by guto030
  1. For PCA:
  2. Microarray vs RNA-Seq: RNA-seq vs. Microarray
  3. Statistical Distribution of RNA-Seq and Microarray: Why Does Rna-Seq Read Count Fit Poisson Distribution? ;
  4. RNA-Seq workflow: (talks about PCA Plots and Batch Correction)

I would highly recommend to search these topics on Biostars. Many of these questions are always repeated and answered. IMHO, its one of the best resources for Computational Biology.

ADD REPLYlink written 2.9 years ago by Ar830

Thank you again! I'm gonna search more on the topics and read these references! Thank you very much!

ADD REPLYlink written 2.9 years ago by guto030
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 842 users visited in the last hour