I am new to RNA-seq data analysis and just starting some QC analysis.
I run the fastqc with default setting and got the report. and would like some comments and suggestions for further QC steps. Please bear with me if the questions are stupid.
The base quality is pretty good. The problem is that the data has very high duplication levels. I read through the documents and find possible reasons are PCA amplification, adapter contamination etc. Any suggestions?
There are also many over-represented sequences and Kmer Content in the report. Any comments?
Another question is for the adapter content. I was told the adapter for trimming they used is CTGTCTCTTATACACATCT but there are certain number of universal adapters in the reads like AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA.
Thanks in advance.