I have a RNA Seq data (Illumina 1.9). I did QC using Fastqc after over represented sequences and adapter removal. On fastqc I observed there were failures for Kmer, GC content and sample duplication modules . Reading on several blog post suggesting it to be a normal occurrence I then aligned to reference genome using STAR followed by HT-seq for read counts and then Deseq2 for differential expression. RNA samples for RNA sequencing was isolated from polysomal heavy fraction so essentially the samples had ribosomal bound messenger transcripts. Poly A selection method was employed the company who did sequencing. After analyzing RNA-Seq data I did RT-qPCR and have validated the findings I have got after Deseq2 analysis and have seen almost similar results to RNA-Seq data.
Percentage of unique reads after deduplication, as suggested by fastqc, for some of my samples is as low as 8%. My validation suggests to me that the libraries were fine. I have read different opinions online and it has got me all confused now. Some suggest to remove duplicates and then proceed, whereas, some suggest it as a no no.
Is this a normal for RNA-seq data to have such a low unique reads as suggested by Fastqc?