The following fastqc report is common to most replicates of a mRNA-seq experiment:
a) Does it mean that although reads are not contaminated with most known 'adapters' (like trueseq2 or nextera) they could be contaminated with other less common adapters? Note: I'm not sure which adapters were used in library preparation
a-1) Should I make a file with all types of adapters and use that file to remove from reads, or in case there's no adapter contamination this might bring problems?
module also shows a warning, and we can see some duplicates 10-50 duplicated reads. However, if choose to remove duplicates, I will loose ~45% of the library. Should I remove duplicates or is this duplication level normal for highly expressed genes? Note: Total number of reads is ~ 15 million.