Question

A question about RNASeq experiment design

0

Entering edit mode

7.5 years ago

nazaninhoseinkhan ▴ 520

Dear all,

I want to design an RNASeq project on cancer. The major goal of this project is to find a list of differential expressed genes between cancer and normal tissues.

In most data sets, deposited data consists of cancerous tissues along with adjacent normal tissues belong to the same patient.

Now my question is: is it possible to compare the cancer tissue of a number of patients with normal tissues belonging to completely normal individuals? Is this comparison biologically meaningful?

I will appreciate any help in advance

Nazanin

rna-seq experiment design • 1.9k views

ADD COMMENT • link updated 7.5 years ago by Michele Busby ★ 2.2k • written 7.5 years ago by nazaninhoseinkhan ▴ 520

0

Entering edit mode

Dear Nazanin, Hi.

I guess in this case you need multiple biological replications for normal humans to decrease the bias of individual variation of gene expression (and maybe your next question would be about "to pool or not to pool ?").

The candidate genes that you aimed for may influence your designe, too. If you are searching for up-regulated or down-regulated new gene(s), minimizing individual variation is more important.

You can also check the pipeline of some papers or database to see if their “primary tumor” and “solid tissue normal” were from same individuals or not.

~ Best

ADD REPLY • link 7.5 years ago by Farbod ★ 3.4k

1

Entering edit mode

Dear Farbod, Hi,

Thank you for your help

Regards

Nazanin

ADD REPLY • link 7.5 years ago by nazaninhoseinkhan ▴ 520

score 1 · Answer 1 · 2016-11-12

I don't know if a straight up differential expression analysis would work well here (e.g. some version of a t-test) but you may be able to do a more sophisticated analysis where you can compare your tumor data to your normal data by clustering it. There are many papers where people look for signatures of cancer this way. They usually require a lot of samples.

It is important, if you are going to be directly analyzing the data, to make sure that libraries of the cancer and the comparison normal data are all prepared in the same way. You would need to do a lot of extra normalizing if you want to mix data from a poly A TruSeq protocol with data from, e.g. a protocol that uses RiboZero.

Also, it is important to know that differences in sample handling can introduce big artifacts into the data. Fresh frozen tissue will usually be in better shape than FFPE samples but even then things like how long it took to process the tissue will affect the data. Without good handling the RNA will break up into alphabet soup. Then if you use a poly A protocol you will have a huge 3' bias because the 5' end is not longer joined to the poly A tail. This will show up in the data as length bias when you compare the samples and needs to be normalized out before analysis. This is an issue as the normal tissue is often from deceased donors and obviously it is difficult to just go in and take the tissue.

There are computational ways to smooth out these differences and get meaningful results. There are some in the GTex papers. But it is better to consider these things at the design phase so you can minimize them if possible.

Finally, big numbers are you friend.

I assume that you have already looked through existing RNA Seq datasets to see if the data you need to answer you question already exists. You may also want to look at Oncomine. It also includes a lot of microarray studies and the data is pretty easy to interrogate. The cancer you are looking at might be in there. Existing datasets are also good for telling you how many replicates you are going to need.