Question

Is it good idea to use two different quatification methods from TCGA at same time ?

2

Entering edit mode

11.2 years ago

jack ▴ 990

I want to get expression data from TCGA for the cancer of my interest around half of data are RNASeqv2 and the rest from RNASeqv.

This is from TCGA:

RNASeq Version 2 is similar to RNASeq in that it uses sequencing data to determine gene expression levels. RNASeq Version 2 uses a different set of algorithms to determine the expression levels are the results are presented in a slightly different set of files.

There are two analysis pipelines used to create Level 3 expression data from RNA Sequence data. The first approach used at TCGA relies on the RPKM method, while the second method uses MapSplice to do the alignment and RSEM to perform the quantitation

I want to use this data to build a regulatory network. My question is that, should I use just RNAsev or RNASeqV2 or I can mix all of them and use them in my model? What's the problem? What's the disadvantage of using both of them? (Some samples come from RNASeqv2 and others from RNASeq)

tcga RNA-Seq next-gen • 3.1k views

ADD COMMENT • link updated 3.8 years ago by Ram 45k • written 11.2 years ago by jack ▴ 990

Ram · Accepted Answer · 2014-04-22

I would use the dataset that maximizes the sample size (which I would guess to be V2).

The isoform expression levels will vary if you use a different tool for mRNA quantification. The gene-level quantification should be more similar (and is what I would recommend using anyways), but it is best to avoid potential sources of bias if you can.

I would expect all old samples should be run with the latest pipeline. For example, I would check the publication data site to see what data is listed. For example, I only see V2 quantification for the latest publication: https://tcga-data.nci.nih.gov/docs/publications/