Question: Small RNA-seq experimental design, pooling samples or not ? which one is better?
1
gravatar for CandiceChuDVM
17 months ago by
CandiceChuDVM1.7k
United States/College Station/Texas A&M University
CandiceChuDVM1.7k wrote:

Hi all,

I have 3 types of diseases and one control group. Each disease has 10 samples. Ideally we would like to have individual library preparations (lib. prep.) for each sample, however, our budget won't be able to afford it. So that leaves us with one option: pooling samples.

For sample pooling, I was wondering if I should pool samples before isolation (e.g. combined biofluid first then do isolation) or I should combine isolated RNA in one pool for lib. prep.? Any thoughts?

ADD COMMENTlink modified 3 months ago by BIOTECH.DEEPTI9110 • written 17 months ago by CandiceChuDVM1.7k
1

Your may like to add "pooling" in title and keyword of the Q.

ADD REPLYlink written 17 months ago by Santosh Anand3.9k

Hey Folks, I also need some help in this regard. I am having RNA-seq data of pooled samples from cell line i.e. Control, Treatment 1 and Treatment 2 all with pooled RNA of triplicates to get 3 samples (1 for control, 1 for treatment 1, 1 for treatment 2). What strategy I should follow to get the maximum results out of it. I know that this strategy is not much appropriate in case of RNA-seq nowadays but due to insufficient funds at the last moment there was only one option left for me. Please suggest me some solution for this. I also want to mention that I have used Cuffdiff to find the differentially expressed genes based on FPKM values

ADD REPLYlink modified 3 months ago • written 3 months ago by BIOTECH.DEEPTI9110

It is never a good idea to ask a new question in an existing thread. Please start a new thread for this question.

ADD REPLYlink written 3 months ago by genomax55k
3
gravatar for h.mon
17 months ago by
h.mon19k
Brazil
h.mon19k wrote:

My best advice is to pick the most interesting / important disease, and do library preps of the samples without pooling, as many biological replicates as you can afford. In my opinion, it is better to have a smaller, well-designed experiment, with good statistical power, than to try to answer all questions and end up with no power to have confidence in the results.

ADD COMMENTlink written 17 months ago by h.mon19k

I understand your point. No doubt we should have as many biological replications as we can (I can have up to 10 per disease). Since you mentioned "without pooling", I was wondering how sample pooling would affect the statistical power if the numbers of lib. prep. are the same? Any reference?

ADD REPLYlink modified 17 months ago • written 17 months ago by CandiceChuDVM1.7k
1
gravatar for Santosh Anand
17 months ago by
Santosh Anand3.9k
Santosh Anand3.9k wrote:

We have some experience in pooling for DNA-sequencing, and it worked quite well for variant call and allele frequency estimation (using a pool size of 12) https://www.nature.com/articles/srep33735

This review paper discusses many aspects of pooling in DNA-sequencing https://www.nature.com/nrg/journal/v15/n11/full/nrg3803.html

I don't have any experience in RNA-seq pooling though. Since the quantity of different RNA produced in cells vary widely and depend upon many factors like pathology, time, tissue etc, pooling might not be the best strategy. So I would recommend to do a pilot study where you compare the results of RNA quantification from some of the samples of a single pool, and see how much intra-pool variability is there. If they vary widely, pooling will be meaningless.

For the last Q, the important thing to keep in mind is that you need to balance the quantity of RNA coming from different samples. So the 2nd approach (combine isolated RNA) looks better to me.

ADD COMMENTlink written 17 months ago by Santosh Anand3.9k
1
gravatar for Michele Busby
17 months ago by
Michele Busby1.9k
United States
Michele Busby1.9k wrote:

So with 40 samples you may be able to bring down library prep costs with a high throughput protocol. Usually this means you barcode your individual samples with some sort of in line barcode early on and then you just do one library prep and sequence it. We did this with the RNA Tag Seq protocol for prokaryotic RNA and it is a lot cheaper.

This is one for miRNA but I believe there are others: https://www.ncbi.nlm.nih.gov/pubmed/25030917

I don't know if anyone is offering this commercially as a service or kit. You may want to check around.

If that seems too daunting (and it is of course pricey to use a protocol you don't know well):

More reps with shallow sequencing is usually better than fewer reps with deeper sequencing. If you could reduce the total cost by doing minimal sequencing on the 40 reps I would do that. Of course, the library prep kits are expensive.

If you must pool, I would think pooling the RNA would allow for better quants to balance the pools than pooling biomaterial, i.e. ensuring there is even pooling between samples and one is not dominating the pool.

There are some older papers that say pooling is theoretically fine. People used to pool a lot with mircroarrays because the arrays themselves are so expensive. But the problem with pooling is that outlier samples may be dominating your findings but these become invisible in the sequencing. So if there is expected to be diversity among individuals (or if there is unexpected diversity) in your disease state then you may get garbage out of the experiment.

If there isn't diversity and you get truth you may nevertheless have a reviewer who expects diversity and doesn't believe your (true) paper. Then you're stuck.

In any case, I would have at least 3 pools per disease, so you can at least identify outlier pools if not outlier samples.

If you are only interested in big fold changes you may be able to get that with some individual reps. For regular RNA Seq 3 reps usually gets you most of your 4x fold changes and some of your 3X. I don't know for miRNA.

ADD COMMENTlink written 17 months ago by Michele Busby1.9k
1

You could run your experiment with 3-5 reps and then independently validate interesting genes with your remaining replicates.

That's a nice experiment.

ADD REPLYlink written 17 months ago by Michele Busby1.9k
0
gravatar for CandiceChuDVM
17 months ago by
CandiceChuDVM1.7k
United States/College Station/Texas A&M University
CandiceChuDVM1.7k wrote:

Thanks for all replies. I think we have reached a conclusion that pooling isolated RNA is preferred over pooling biofluids.

For the question "to pool or not to pool", the aforementioned papers only provide limited information. I've found another paper "Experimental validation of methods for differential gene expression analysis and sample pooling in RNA-seq" that focuses on RNA-seq. However, if you look closely in the experimental design, you would found their pooling is actually technical replicates for library preps so it was not very helpful.

I have been playing around the count table provided by the author with edgeR. Since I don't have the actual pooling library preps, I simply average the reads counts in different columns as if I am "pooling" them. It turns out that if you have same numbers of samples (n=8), to pool them (2 samples in each pool, 4 pools in total) or not lead to a huge difference in the number of DEGs. It seems that the DEGs found in pooled design are more and the library prep costs are lower, but the number of qPCR-verified DEGs is lower as well.

So the take home message for me, personally, is that using unpooled samples are better under the same number of library preps.

ADD COMMENTlink written 17 months ago by CandiceChuDVM1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 735 users visited in the last hour