Question: Appropriate DESeq2 normalisation method for TRAP analysis
0
gravatar for volvicpellegrino
13 months ago by
volvicpellegrino0 wrote:

I work with RNA obtained by translating ribosome affinity purification. RNA is immunopurified from genetically labelled ribosomes, expressed only in the cell type of interest. An 'input' sample is taken, by extracting RNA directly from the homogenised tissue before immunopurification, as a background RNA comparison.

The first stage of analysis is to identify genes being translated in the cell type of interest. For this, most papers appear to use DESeq2 to compare their purified RNA with their input sample.

I have been concerned that the composition of the RNA between purified and input differs so much that DESeq2's normalisation factor calculation may confound the analysis. Depending on the cell type investigated, there may be between 4-9000 DEGs (out of 14000 with >10 counts average) between purified and input samples.

Currently the normalisation factors for purified samples are ~0.9 whereas for the input samples they are ~1.3. There were 4000 DEGs between purified and input.

Should I be concerned or is this not an issue? The libraries were prepared with ERCC spike-ins, in case this is recommended as an alternative option.

trap deseq2 • 324 views
ADD COMMENTlink modified 13 months ago by Asaf8.2k • written 13 months ago by volvicpellegrino0
0
gravatar for Asaf
13 months ago by
Asaf8.2k
Israel
Asaf8.2k wrote:

The DESeq2 normalization assumes that most of the genes have the same expression level so the median should be the same. However, in your case this is true for the highest gene since you can't have purified RNA more than total RNA, having genes "upregulated" in purified doesn't make any biological sense, the genes with highest purified/total ratio are the ones that have the best translation rate. This means that DESeq2 might be convenient but using it this way is not biologically coherent. Another point is that the total RNA is dependent in the translation rate so it's not really clear to me what your conclusion from such a comparison would be, it will probably depend on the experiment settings.

What you _can_ do is compare the purified RNA between two conditions and the total RNA between those conditions and then compare the two results, maybe use a differential equation to explain how these two differences align with each other. I'm not a physicist so I can't give any insight here.

ADD COMMENTlink written 13 months ago by Asaf8.2k

Thanks for your reply.

I understand that genes in the purified sample are not 'upregulated', but they are enriched compared to input RNA. Using differential expression analysis is what others have performed to identify which genes are translated in the target cell type (for example: Epigenetic regulation of brain region-specific microglia clearance activity). My concern is whether the normalisation is appropriate to check for gene enrichment.

The main objective of my work is to compare the purified samples across conditions, where I can assume that the RNA composition is similar. As you say, I will also compare the input RNA between conditions to then show how condition-dependent DEGs differ between the purified and input samples.

ADD REPLYlink modified 13 months ago • written 13 months ago by volvicpellegrino0

They are not "enriched", they just have high translation rate. Since the purification step highly changes the RNA composition (I would expect so at least) you might see biases in genes with long UTRs for instance so directly comparing purified and input won't give you the results you want. Honestly I think that if you have enough coverage you can assume normal distribution and compute a translation efficiency rate and compare those between conditions.

ADD REPLYlink written 13 months ago by Asaf8.2k

I don't understand why it is not correct to say 'enriched'. I understand that the rate of translation of some genes may be low and therefore the difference in coverage of those genes compared to the input sample will not be large. Also that many translated genes will also be expressed in neighbouring cell types, reducing the difference in coverage compared to the input RNA. But in the case of highly translated and cell type specific genes, their transcripts are surely enriched in the purified RNA sample compared to the input RNA?

In your example of UTR bias, are you saying that genes with longer 5' UTRs will have greater coverage and and therefore bias interpretation of which genes are being translated in the target cell-type towards those genes?

Could you give an example of what you mean by your last sentence? What would I assume is normally distributed?

Thank you for your help

ADD REPLYlink written 13 months ago by volvicpellegrino0

What I meant is that you have an upper limit on the number of RNA molecules you will have in the purified and it will always be <= input. This is why I think enriched is misleading here. You have high rate of translation and low rate and you might compare these rates between two cell types but stating that: "The number of purified reads compared to input reads is enriched" is misleading, it can be high or low (within the range 0-input) but not enriched.

In the UTRs I was just trying to say that the purified/input can't be compared between genes in the same sample due to biases we might not be aware of.

My last suggestion was to forget about the negative binomial statistical model and just compute a translation efficiency ratio (probably in the range 0-1) and compare these ratios between samples. Since you're giving up on the statistical model you'll need more replicates to make statistically significant conclusions.

ADD REPLYlink written 13 months ago by Asaf8.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1265 users visited in the last hour