Recently I received some small rna-seq data to analyze. Since I have never worked with small rna-seq data I'm a bit lost with fastQC quality control results before and after adaptor trimming.
First is per base sequence content is a mess. I'm not sure if it's normal in small rna-seq but it fails on the test pretty hard.
Second GC content is very strange. Before trimming this graph is peaking at same position than the theoretical distribution. But after trimming it's showing two peaks around 58% (theoretical peak) and 78%, I have never seen this before.
Third is sequence length distribution, before adaptor/quality trimming it's normal with all sequences around 76 bp. However after trimming my sequences ranges from 20 to 76, peaking around 20 and 33. Usually in normal RNA-seq data I do not notice a huge change in length distribution like this.
Finally, I have a lot of duplication and over-represented sequences even after adaptor trimming (around 95% of the sequences had adaptors).
From my research I read that fastQC metrics are not very good for small rna-seq thus I should not worry too much about it. Buy my question is, to which point I should not worry? I would like some insight from people that work with small rna-seq datasets.Thanks in advance.