I have rna seq samples which show 3' bias in the gene body coverage of 10,000 random genes.
I first looked at the RIN values to see if degradation was linked to this, but all my samples are over 8 RIN, so I do not think it is the main reason.
I also read that different protocols might show different gene body coverage profiles. In this case we did mRNA seq (as opposed to total RNA seq).
However, a biological subgroup of the samples (plus a sample from another group) shows a significantly more pronounced bias.
Has anyone observed this before? Is there any way to correct for this bias in the counts? I am not sure whether this could cause false positives, as all samples of one biological group have the most pronounced bias.
Were your samples sequenced in different batches? Is the effect specific to one batch? Do you work with a commercial sequencing provider?
I'm also curious about this. I would hope the provider is capturing this bias during QC and conveying it to the client.
3' end bias is pretty common in mRNA-seq protocols...see http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4310221/figure/F6/
Yes it's not surprisingly, but I was more worried by "However, a biological subgroup of the samples (plus a sample from another group) shows a significantly more pronounced bias." This will jeopardize completely your downstream analysis because you will never know if the effect you see is caused by the batch effect or the biological effect.
Thanks all for your comments, and yes this is the main thing that I am worried about. All the libraries were prepared and sequenced in one batch.
I think I will get the genes that are differentialy expressed, compute a per gene coverage distribution for each sample, and see how many of those genes have different gene body coverage between groups.
Ive also seen some people have done counting just in the 3' end of the genes, might try that too.
Just curious, but what sort of biological effect do you think could causing this kind of bias throughout a sample's entire transcript pool?
I don't think this bias is caused by a biological effect but my guess a different treatment (culture condition/tissue collection) of the cells.
thanks ill have a look at this paper
How do we deal with this during analysis? Should we ignore and proceed?
It's not surprising that you are seeing a 3' bias in read mapping since mRNA sequencing typically involves poly-A capture. Transcripts with degraded 3' ends (which may show a 5' bias in read mapping) will not be captured.
It's not uncommon to transcript read mapping bias even for samples that appear to be good quality. This seems to usually be an issue with prep, but a lot of factors can contribute to it. How severe is the bias you're observing?
Thanks for your comment spvensko. The bias of the worse samples have around ~.5 coverage of the highest coverage value, at the middle of the gene body, and it goes down towards the 5'. Then the other samples have ~.75 of the highest coverage value, at the middle of the gene body, going down as it approaches the 5'.