Question: combine two RNA-seq files together
0
gravatar for yueli7
12 days ago by
yueli740
China
yueli740 wrote:

Hello,

I have two RNA_seq data, one is downloaded from cancer sample: https://tcga.xenahubs.net/download/TCGA.OV.sampleMap/HiSeqV2_percentile.gz,
Rank genes RSEM values between 0% to 100%

The other is normal tisssue: GTEx_Analysis_2016-01-15_v7_RNASeQCv1.1.8_gene_tpm.gct.gz Gene TPMs.

My question is how I can combine two file together, and try to find the differentially expressed genes in cancer and normal samples?

Thanks in advance for any help!

Best,

Yue

rna-seq • 176 views
ADD COMMENTlink modified 12 days ago • written 12 days ago by yueli740

Hello, everyone,

I found the normal and cancer data in one dataset in GEO.

Thanks for any help!

Yue

ADD REPLYlink written 12 days ago by yueli740
4
gravatar for ATpoint
12 days ago by
ATpoint19k
Germany
ATpoint19k wrote:

You cannot simply download two completely unrelated datasets and then perform differential analysis. There are almost certainly technical confounders (batch effects) that will dominate (=create false results) your results. One can only compare samples from the same lab, same protocol, same study. Everything else will almost certainly contain a large number of false-positives/negatives. Please read about RNA-seq analysis first, e.g. https://peerj.com/preprints/27283/ and https://www.bioconductor.org/packages/devel/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html.

You need raw counts (non-normalized) to perform meaningul statistical analysis and biological/experimental replicates. One cannot simply take any counts downloaded from a database. Read the linked articles carefully and then reconsider your strategy.


Look (and I really mean no offence at all) but this post and the ones about ChIP-seq analysis that you posted recently imply that you are an beginner in the field. You need more background before start analyzing data. Without proper background knowledge and some experience your analysis will almost certainly be flawed and therefore meaningless. Bioinformatics is quite a difficult field because there are very few standards and a lot of pitfalls. If you can, please take a course with an experienced supervisor. In any case, read as much as you can in online tutorials and blogs. Try to understand how things work and most importantly: Use established tools and workflows. Don't create custom analysis strategies before you gain a very good understanding of what you are doing. Again, I mean absolutely no offence, I just try to save you from beginners mistakes that might cost you a lot of time while basically producing no output.

ADD COMMENTlink modified 12 days ago • written 12 days ago by ATpoint19k

Hello, ATpoint,

Thank you for your response!

I'd better to compare in one dataset.

But my boss want to me, compare the normal and cancer sample.

Actually, there is not many normal sample in TCGA.

I have to process the data from fastq?

Thank you again!

Yue

ADD REPLYlink modified 12 days ago • written 12 days ago by yueli740
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1985 users visited in the last hour