I apologise if I am asking a basic question but I was wondering if someone here could clue me in about the role of ERCC spike-in for RNA-Seq?
I've been given a few sets of RNA-Seq data to align to a reference genome and do differential gene expression analysis. I was going to do this via mapping to the reference as opposed to de novo.
I noticed when blasting my over-represented sequences generated from FASTQC that in one sample, I had an over-represented sequence caused by the ERCC spike in. I've tried to understand the role of this in differential gene expression analysis but I'm struggling a bit.
My questions are:
1)Is it normal to present as an over-represented sequence in 1 sample only? 2) Do I need to remove it for mapping and differential expression analysis? 3) If I need to remove it, what's the best way of going about it?
Thank you very much in advance,