Question: Unequally pooling libraries for RNAseq?
I have a question about pooling libraries for RNAseq. I'll really appreciate it if you could comment on this.

I have 12 samples that I would like to run RNAseq with. The 12 samples were collected from 2 different fungal cultures under 2 different conditions, and 3 replications under each condition.

samples 1-3 fungus 1 condition 1

samples 4-6 fungus 2 condition 1

samples 7-9 fungus 1 condition 2

samples 10-12 fungus 2 condition 2

Samples 1-6 were extracted from pure fungal cultures under condition 1, however, samples 7-12 were extracted from fungal cultures under condition 2 and the fungal cultures were contaminated with plant tissue, bacteria, and protozoa. I am specifically interested in the gene expression of the fungal tissue. And my goal is to compare gene expression between fungus 1 and 2, and condition 1 and 2.

I was wondering, because of the contamination in samples 7-12, is it possible to unequally pool the libraries when combining all the libraries? ( increasing the amount of RNA libraries of samples 7-12 to increase the sequencing depth of the fungal RNA in the samples). Would that be acceptable? Would it give me any trouble when analyzing the data?

Don't you think that contamination actually could have caused changes in genes expression levels? Will your further comparisons will make sense biologically?

Good point! I am looking at the gene expression after fungus-plant interaction. But I'm specifically interested in the fungal side.

From the experimental design I can't think of any issues with unequal pooling, most differential expression algorithms account for differences in sequencing depth.

My concern would be the best way to remove the contaminating material. If you have a VERY well annotated fungal genome you can apply some stringent mapping criteria (fewer mutations or indels permitted) to avoid contaminating reads. Ideally you could have genomes for the contaminating plant, etc., map to those genomes, then take your unmapped reads and map them to the fungal genome. But I'm guessing that's not a real option.

As a sanity check you can always downsample your mapped bam files to equal sequencing depth and repeat your differential expression. You should have very similar results when using the full libraries and when using the downsampled libraries.

Can use bbsplit from BBmap toolset to align to two genomes, common technique for example in patient-derived xenograft (PDX) where human tumour is grown in a mouse. Works well for that application in my experience.

I'd imagine that depends on well defined genomes. In this case it sounds like jingjin just has a mixture of various species together. Good advice for when I have a mixed sample in the future though!

Thank you for your suggestion! I would definitely check it out when I get my data back.

