Question

RNA-seq healthy sample

0

Entering edit mode

4.4 years ago

Ali • 0

Hi

I'm doing differential gene expression analysis in two type of leukemia I need normal data (RNA-seq data). the data were downloaded from DGC hub in the xena website I have done the differential expression but I release i should use the healthy sampe as reference to give clear idea how the gene expression different between the normal and each type of leukemia.

Kind regards

RNA-Seq gene • 1.3k views

ADD COMMENT • link updated 4.4 years ago by ATpoint 82k • written 4.4 years ago by Ali • 0

0

Entering edit mode

You can't (or at least shouldn't) compare samples you didn't sequence yourself under as close to identical conditions as possible.

There are likely to be significant batch effects between some random data from the internet and your own, that would obscure any real differences.

ADD REPLY • link 4.4 years ago by Joe 21k

0

Entering edit mode

Thanks for reply I compare adult with paeditric in leukemia Aiming to have specific set of genes for each using TCGA and TARGET

the finding was good only need RNA-seq data from healthy sample

regards

ADD REPLY • link 4.4 years ago by Ali • 0

score 1 · Answer 1 · 2019-12-04

1

Entering edit mode

4.4 years ago

ATpoint 82k

If they do not provide healthy controls then these is probably ot much you can do about it. Using any unrelated dataset in the same statistical analysis is meaningless as you cannot distinguish biological effects from technical confounders. Also, you would need to carefully choose which healthy cells you actually consider an appropriate control. Would it be a healthy orogenitor cell, a monocyte, a granulocyte? That depends on the type of leukemia you are investigating and is not trivial. Tyically one compares disease subtypes with each other and then clusters them based on their relative differences in transcription. If you then have the hypothesis that certain leukemias derive from a certain cell type you would need to perform additional experiments to confirm that. There is no guarantee that each type of leukemia has the same cell-of-origin (not even discussing now that leukemia is not at all a precise disease category, as there are lymphoid and myeloid leukemias with all kinds of subtypes). If you compare any given sample with three different cell types from normal donors you'll get different results each time, so I think you need to define first based on your data what you want to compare with.

ADD COMMENT • link 4.4 years ago by ATpoint 82k

0

Entering edit mode

TCGA provide RNA-seq count for different type of cancer which already being used in differential gene expression analysis. I manage to do the analysis and the result was fine for this compression only need of normal health sample to increase the specificity

ADD REPLY • link 4.4 years ago by Ali • 0

0

Entering edit mode

You aren't reading what we've written: you cannot simply pair up distinct datasets. It is meaningless.

ADD REPLY • link 4.4 years ago by Joe 21k

0

Entering edit mode

that's the answer I wanted to give, but I really hoped that there is some super-smart way with Bayesian latent variable analysis or Deep Learning or AI published in Nature several months ago =( well, looks like nobody is aware of this...like, even PEER would not do this magic? https://www.ncbi.nlm.nih.gov/pubmed/22343431

ADD REPLY • link 4.4 years ago by German.M.Demidov ★ 2.9k

2

Entering edit mode

If we knew a priori what differences were due to biology, and not due to batch, we wouldn't have to run the experiment.

ADD REPLY • link 4.4 years ago by swbarnes2 14k

1

Entering edit mode

Maybe, but to me all you're doing is heaping layers of obscurity and error upon layers of obscurity and error.

You can't polish a turd as they say.

ADD REPLY • link 4.4 years ago by Joe 21k

0

Entering edit mode

there is two data sets for lung cancer if you want to do differential gene expression analysis you need to use normal sample(as control). If you just run the this two file you will get genes that differentiate between two subtype But what about normal you will need it to confirm that gene expression in three condition Normal, LUSC and LUAD

And normal data it should similar to cancer one in term of size or number of cases

https://gdc.xenahubs.net/download/TCGA-LUSC.htseq_counts.tsv.gz; Full metadata

https://gdc.xenahubs.net/download/TCGA-LUAD.htseq_counts.tsv.gz; Full metadata

ADD REPLY • link 4.4 years ago by Ali • 0

0

Entering edit mode

Of course you'll get differential gene results. It doesn't mean its real or informative.

If you want to press on and waste your time analysing random, incompatible data from the internet, be my guest - but when it turns out to be a waste of time, and it will, don't say we didn't tell you.

ADD REPLY • link 4.4 years ago by Joe 21k

0

Entering edit mode

My reason of doing this is to have specific genes for each subtype then go to lab to validate the result.

ADD REPLY • link 4.4 years ago by Ali • 0

0

Entering edit mode

I'd suggest to do cost-effectiveness analysis: how much money you'd spend for RNAseq of paired normal samples in order to get a limited list of strong candidates vs how much money you'd spend on validation of hundreds of genes.

ADD REPLY • link 4.4 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

After using intensive bioinformatics tools to identify list of gene for specific subtype of cancer the lab will be have high probability to get robust result

ADD REPLY • link 4.4 years ago by Ali • 0

0

Entering edit mode

The point of @Joe was that it will be impossible to ensure that you "will be have high probability to get robust result", whatever bioinformatics tricks you apply, and after short thinking about that I subscribe to this point of view.

ADD REPLY • link 4.4 years ago by German.M.Demidov ★ 2.9k