How should I handle the raw reads with failed per base sequence content in fastQC
3
1
Entering edit mode
5.1 years ago

Hey guys: I am doing RNA-seq analysis and it seems that the quality of my reads is not desirable.
Below is a typical fastqc report for my data.
I have read many tutorial about fastqc, from my understanding, it seems that the 1-10 bp are adaptor sequences. But in the adaptor content section, there is no waining.


  • I am wandering if my understanding is right?
  • Should I use trimmomatic to cut adaptor sequences?

Failed per base sequence content

RNA-Seq fastQC • 11k views
ADD COMMENT
1
Entering edit mode

All you need to know is in this blog post by FastQC authors.

Don't do anything specific is the take home for this particular observation. Data should be fine.

ADD REPLY
0
Entering edit mode

Thank you for your help, now I think my problem is solved !

ADD REPLY
2
Entering edit mode
5.1 years ago
caggtaagtat ★ 1.9k

Hi,

This is totally normal for RNA-sequencing data even after removal of the adapter sequences. The random hexamer primers, which are used to generate the cDNA library from your RNA transcripts were shown to not bind completly random. This non-random binding leads to this bias in "per base sequence content" from base 1-15.

You don't have to trim these sequences, if fastqc does not report recognizes adapters. Just be aware, that the used primers do not lead to a completly random amplification.

ADD COMMENT
0
Entering edit mode

Thank you for your patient reply. I really learnt a lot !

ADD REPLY
2
Entering edit mode
5.1 years ago
Ido Tamir 5.2k

The adapter starts at the 3' end of your reads, not the 5' (unless its an adapter dimer - i.e. no insert).

This is the result of random priming in RNA-Seq. I think Biases in Illumina transcriptome sequencing caused by random hexamer priming is the first paper on this. This represents real sequences. After alignment you can check the error rate in the reads, its only marginally higher 5' than in the rest of the read. And the fastqc report of the aligned sequences should show the same pattern.

ADD COMMENT
0
Entering edit mode

I think you are right! Thank you very much.

ADD REPLY
1
Entering edit mode
5.1 years ago
vin.darb ▴ 300

Adaptors sequences are present in 3' on the read, so according to the chart there does not seem to be any adapter contamination The 10 overepresente bp on the 5' come from primers. Personally, I remove them if the initial quality of my reads is not very good, but if I have reads of good quality this is not a problem for alignment

ADD COMMENT

Login before adding your answer.

Traffic: 2927 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6