Hi everyone, I'm a very newbie in Bioinformatics. I have a question about the treatment of duplication for RNAseq. Should I remove duplication (natural and artificial)?
I tried to use dupRadar for duplication in R. With one sample, It shows me as below
>fit <- duprateExpFit(DupMat=dm)
>cat("duprate at low read counts: ",fit$intercept,"\n",
"progression of the duplication rate: ",fit$slope,fill=TRUE)
> duprate at low read counts: 1.159792, progression of the duplication rate: 2.319875.
I don't know what is meaning exactly. Please let me know whether I should remove duplication for further analysis or not.
Thanks a lot in advance.
Thank you so much for your answer. But still, I'm a bit confused because of the discussion of links. It looks like depending on the condition. depRadar tool may solve the problem? please let me know.
Even if there is a problem (with PCR duplicates), it is not possible to solve it cleanly unless you were using unique molecular indexes (UMI, which adds cost and complexity). UMI's are used to label individual RNA molecules before they are converted to cDNA and amplified. dupRadar will remove duplicates but you would be throwing away good counts and may skew your downstream analysis.
Thanks a lot! It makes me clear. Thank you again for your solution!