Question

How should I handle the raw reads with failed per base sequence content in fastQC

1

Entering edit mode

5.1 years ago

yujinlong000703 ▴ 10

Hey guys: I am doing RNA-seq analysis and it seems that the quality of my reads is not desirable.
Below is a typical fastqc report for my data.
I have read many tutorial about fastqc, from my understanding, it seems that the 1-10 bp are adaptor sequences. But in the adaptor content section, there is no waining.

I am wandering if my understanding is right?
Should I use trimmomatic to cut adaptor sequences?

Failed per base sequence content

RNA-Seq fastQC • 11k views

ADD COMMENT • link updated 5.1 years ago by Ido Tamir 5.2k • written 5.1 years ago by yujinlong000703 ▴ 10

1

Entering edit mode

All you need to know is in this blog post by FastQC authors.

Don't do anything specific is the take home for this particular observation. Data should be fine.

ADD REPLY • link 5.1 years ago by GenoMax 141k

0

Entering edit mode

Thank you for your help, now I think my problem is solved !

ADD REPLY • link 5.1 years ago by yujinlong000703 ▴ 10

1

Entering edit mode

5.1 years ago

vin.darb ▴ 300

Adaptors sequences are present in 3' on the read, so according to the chart there does not seem to be any adapter contamination The 10 overepresente bp on the 5' come from primers. Personally, I remove them if the initial quality of my reads is not very good, but if I have reads of good quality this is not a problem for alignment

ADD COMMENT • link 5.1 years ago by vin.darb ▴ 300

score 2 · Accepted Answer · 2019-03-28

2

Entering edit mode

5.1 years ago

caggtaagtat ★ 1.9k

Hi,

This is totally normal for RNA-sequencing data even after removal of the adapter sequences. The random hexamer primers, which are used to generate the cDNA library from your RNA transcripts were shown to not bind completly random. This non-random binding leads to this bias in "per base sequence content" from base 1-15.

You don't have to trim these sequences, if fastqc does not report recognizes adapters. Just be aware, that the used primers do not lead to a completly random amplification.

ADD COMMENT • link 5.1 years ago by caggtaagtat ★ 1.9k

0

Entering edit mode

Thank you for your patient reply. I really learnt a lot !

ADD REPLY • link 5.1 years ago by yujinlong000703 ▴ 10

score 2 · Accepted Answer · 2019-03-28

2

Entering edit mode

5.1 years ago

Ido Tamir 5.2k

The adapter starts at the 3' end of your reads, not the 5' (unless its an adapter dimer - i.e. no insert).

This is the result of random priming in RNA-Seq. I think Biases in Illumina transcriptome sequencing caused by random hexamer priming is the first paper on this. This represents real sequences. After alignment you can check the error rate in the reads, its only marginally higher 5' than in the rest of the read. And the fastqc report of the aligned sequences should show the same pattern.