HiSeq vs GA for RNASeqV2 on TCGA
1
0
Entering edit mode
8.2 years ago
XD ▴ 10

I want to pull off RNASeq data from TCGA... but datasets are available in both HiSeq and GA platforms. Can someone please advise which dataset would be better to use and why? Thanks!

TCGA RNA-Seq HiSeq GA Illumina • 4.5k views
ADD COMMENT
0
Entering edit mode

You are going to have to be way more specific. There are something like 20 different cancer types in TCGA, and the data was produced across 5 years at different centers. The platforms evolved along with the project, and so the data spans a wide variety of sequencing platforms. (even early hiseq may be quite different than late hiseq in terms of read lengths, etc.)

ADD REPLY
0
Entering edit mode

Thanks, Chris. I am looking for colorectal adenocarcinoma, specifically.

ADD REPLY
0
Entering edit mode

These two options are quite the same in terms of the chemistry of the sequencing, but the HiSeq device is a newer sequencer so theoretically I would go with that data..

https://wiki.nci.nih.gov/display/TCGA/RNASeq+Version+2

ADD REPLY
2
Entering edit mode
8.2 years ago

Looking at the COAD data set in the data matrix, it appears that there's no overlap in the v2 RNAseq. Either a sample has GA data or HiSeq data. So what you use will depend on the questions you're asking. If you need all the samples, use both. If you're worried about sequencer-driven batch effects, the GA produced the vast amount of the data, so you'd probably want to exclude the HiSeq.

FWIW, I'd just use all of it - as long as the read lengths are similar, there's unlikely to be much in the way of difference - the chemistry is largely the same.

ADD COMMENT
2
Entering edit mode

If you use both, consider using batch removal or correction approaches such as RUVseq.

ADD REPLY

Login before adding your answer.

Traffic: 2498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6