Question: combine two RNA-seq files together
0
gravatar for yueli7
7 months ago by
yueli7110
China
yueli7110 wrote:

Hello,

I have two RNA_seq data, one is downloaded from cancer sample: https://tcga.xenahubs.net/download/TCGA.OV.sampleMap/HiSeqV2_percentile.gz,
Rank genes RSEM values between 0% to 100%

The other is normal tisssue: GTEx_Analysis_2016-01-15_v7_RNASeQCv1.1.8_gene_tpm.gct.gz Gene TPMs.

My question is how I can combine two file together, and try to find the differentially expressed genes in cancer and normal samples?

Thanks in advance for any help!

Best,

Yue

rna-seq • 306 views
ADD COMMENTlink modified 7 months ago • written 7 months ago by yueli7110

Hello, everyone,

I found the normal and cancer data in one dataset in GEO.

Thanks for any help!

Yue

ADD REPLYlink written 7 months ago by yueli7110
4
gravatar for ATpoint
7 months ago by
ATpoint30k
Germany
ATpoint30k wrote:

You cannot simply download two completely unrelated datasets and then perform differential analysis. There are almost certainly technical confounders (batch effects) that will dominate (=create false results) your results. One can only compare samples from the same lab, same protocol, same study. Everything else will almost certainly contain a large number of false-positives/negatives. Please read about RNA-seq analysis first, e.g. https://peerj.com/preprints/27283/ and https://www.bioconductor.org/packages/devel/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html.

You need raw counts (non-normalized) to perform meaningul statistical analysis and biological/experimental replicates. One cannot simply take any counts downloaded from a database. Read the linked articles carefully and then reconsider your strategy.


Look (and I really mean no offence at all) but this post and the ones about ChIP-seq analysis that you posted recently imply that you are an beginner in the field. You need more background before start analyzing data. Without proper background knowledge and some experience your analysis will almost certainly be flawed and therefore meaningless. Bioinformatics is quite a difficult field because there are very few standards and a lot of pitfalls. If you can, please take a course with an experienced supervisor. In any case, read as much as you can in online tutorials and blogs. Try to understand how things work and most importantly: Use established tools and workflows. Don't create custom analysis strategies before you gain a very good understanding of what you are doing. Again, I mean absolutely no offence, I just try to save you from beginners mistakes that might cost you a lot of time while basically producing no output.

ADD COMMENTlink modified 7 months ago • written 7 months ago by ATpoint30k

Hello, ATpoint,

Thank you for your response!

I'd better to compare in one dataset.

But my boss want to me, compare the normal and cancer sample.

Actually, there is not many normal sample in TCGA.

I have to process the data from fastq?

Thank you again!

Yue

ADD REPLYlink modified 7 months ago • written 7 months ago by yueli7110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 795 users visited in the last hour