Question

Unusual FastQC Per Base Sequence Content

0

Entering edit mode

2.8 years ago

RBright21 ▴ 10

Hi Everyone

Is anyone able to assist with the cause of something unusual I have noted on a FastQC output? I have scoured the internet but I have not been able to find a definitive answer on the cause.

I have 150bp illumina Miseq generated reads enriched for a DNA virus (the GC content is expected to be 56%). There is this strange V for the per base sequence content at the end of the reads which does not change when reads are trimmed using trimmomatic. For this particular output file the adaptors have been been trimmed and I have carried out some minor read trimming (SW 4:15 and minimum length 40). At first I suspected adaptors were to blame but it remains even after their removal. It is obviously some sort of bias but shouldn't trimming have minimised the risk of this? Has anyone seen anything like this before and could point me in the right direction please?

Thanks

fastqc output

FastQC Illumina • 1.5k views

ADD COMMENT • link updated 2.8 years ago by jared.andrews07 ★ 16k • written 2.8 years ago by RBright21 ▴ 10

0

Entering edit mode

I've seen this happening often before but than only for the last base.

Can you post the same plot but then don't do the binning (so on a per-base resolution)?

ADD REPLY • link 2.8 years ago by lieven.sterck 15k

0

Entering edit mode

Thanks for your reply. Sorry to have to ask but please could you point me in the direction of instructions on how to generate this with a per base resolution. I am using MacOS and have access to the GUI and command line versions of FastQC. Thanks again

ADD REPLY • link 2.8 years ago by RBright21 ▴ 10

0

Entering edit mode

sure: add the option --nogroup to your command line.

ADD REPLY • link 2.8 years ago by lieven.sterck 15k

0

Entering edit mode

Thanks so much

Here you go - it looks like it is only the end base actually as you suspected. Is this likely to be an adaptor artefact still?

nobin

ADD REPLY • link 2.8 years ago by RBright21 ▴ 10

score 2 · Answer 1 · 2021-07-06

Normally that last base bias is due to no minimum length set during adapter trimming, so if the adapter sequence starts with "A", the last "A" always gets trimmed. As for why it was present even before you trimmed, is it possible that your sequencing provider trimmed the reads prior to giving you the data?

Regardless, it shouldn't be any problem.