I was wondering if the results of my
fastqc run are to be expected under the circumstances.
We are running some 10x Genomics (single-cell) data sets and get a strange GC content plot when running the fastqc tool
In the case of the image attached, the basic statistics tells me, the GC content should be around 41%. But even the theoretical distribution in the plot is around 51% while the true one is difficult to interpret.
So my questions are -
- can I still assume the data is good enough? It doesn't look like there is contamination in the fastq files.
- Can it be, that the reason for this behavior is the specific structure of the 10x Genomics sequences, including the barcodes and the UMI?