Per base sequence content failed in fastQC
1
0
Entering edit mode
5 weeks ago
yliueagle ▴ 260

In the fastQC report of my ChiPseq (paired-end, H3k27me3, Sanger / Illumina 1.9, Sequence length: 42), I got failure in Per base sequence content (see figure below), but other fields such as Adapter Content were fine. What does this indicate and any suggestion for pre-processing the data? Many thanks!

I have ChiPseq reads on

fastQC • 258 views
0
Entering edit mode
5 weeks ago
GenoMax 109k

Please see this blog post from authors of FastQC for the non-random pattern that you see at beginning of reads. That is likely due to tagmentation method used to make the libraries.

You should probably move on with the rest of your analysis. Aligners should be able to deal with bases that don't align and soft-clip them.

0
Entering edit mode

Thanks for the comment. Yes at the beginning of the sequence the variation is explained in the blog post. Here I am more concerning about the middle part of read, where the G C and A T percentage difference is around 10%, which leads to the "failure" in the fastQC report.

0
Entering edit mode

failures on FastQC reports are not immediate indicative of bad data. Yes there is the discrepancy you make a note of, but perhaps it is because you are enriching AT-rich sequences in your ChIPseq. You will not know that until you analyze the data. If data does not make sense after analysis you can retrace the steps back to see where things may have gone wrong.

0
Entering edit mode

Thanks for comment