Question

combine two RNA-seq files together

1

Entering edit mode

6.0 years ago

yueli7 ▴ 250

Hello,

I have two RNA_seq data, one is downloaded from cancer sample: https://tcga.xenahubs.net/download/TCGA.OV.sampleMap/HiSeqV2_percentile.gz,
Rank genes RSEM values between 0% to 100%

The other is normal tisssue: GTEx_Analysis_2016-01-15_v7_RNASeQCv1.1.8_gene_tpm.gct.gz Gene TPMs.

My question is how I can combine two file together, and try to find the differentially expressed genes in cancer and normal samples?

Thanks in advance for any help!

Best,

Yue

RNA-Seq • 2.1k views

ADD COMMENT • link 6.0 years ago by yueli7 ▴ 250

0

Entering edit mode

Hello, everyone,

I found the normal and cancer data in one dataset in GEO.

Thanks for any help!

Yue

ADD REPLY • link 6.0 years ago by yueli7 ▴ 250

score 4 · Answer 1 · 2019-07-09

You cannot simply download two completely unrelated datasets and then perform differential analysis. There are almost certainly technical confounders (batch effects) that will dominate (=create false results) your results. One can only compare samples from the same lab, same protocol, same study. Everything else will almost certainly contain a large number of false-positives/negatives. Please read about RNA-seq analysis first, e.g. https://peerj.com/preprints/27283/ and https://www.bioconductor.org/packages/devel/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html.

You need raw counts (non-normalized) to perform meaningul statistical analysis and biological/experimental replicates. One cannot simply take any counts downloaded from a database. Read the linked articles carefully and then reconsider your strategy.

Look (and I really mean no offence at all) but this post and the ones about ChIP-seq analysis that you posted recently imply that you are an beginner in the field. You need more background before start analyzing data. Without proper background knowledge and some experience your analysis will almost certainly be flawed and therefore meaningless. Bioinformatics is quite a difficult field because there are very few standards and a lot of pitfalls. If you can, please take a course with an experienced supervisor. In any case, read as much as you can in online tutorials and blogs. Try to understand how things work and most importantly: Use established tools and workflows. Don't create custom analysis strategies before you gain a very good understanding of what you are doing. Again, I mean absolutely no offence, I just try to save you from beginners mistakes that might cost you a lot of time while basically producing no output.