Question: How should I handle the raw reads with failed per base sequence content in fastQC
1
gravatar for yujinlong000703
4 months ago by
yujinlong00070310 wrote:

Hey guys: I am doing RNA-seq analysis and it seems that the quality of my reads is not desirable.
Below is a typical fastqc report for my data.
I have read many tutorial about fastqc, from my understanding, it seems that the 1-10 bp are adaptor sequences. But in the adaptor content section, there is no waining.


  • I am wandering if my understanding is right?
  • Should I use trimmomatic to cut adaptor sequences?

Failed per base sequence content

fastqc rna-seq • 349 views
ADD COMMENTlink modified 4 months ago by Ido Tamir5.0k • written 4 months ago by yujinlong00070310
1

All you need to know is in this blog post by FastQC authors.

Don't do anything specific is the take home for this particular observation. Data should be fine.

ADD REPLYlink written 4 months ago by genomax70k

Thank you for your help, now I think my problem is solved !

ADD REPLYlink written 4 months ago by yujinlong00070310
2
gravatar for caggtaagtat
4 months ago by
caggtaagtat700
caggtaagtat700 wrote:

Hi,

This is totally normal for RNA-sequencing data even after removal of the adapter sequences. The random hexamer primers, which are used to generate the cDNA library from your RNA transcripts were shown to not bind completly random. This non-random binding leads to this bias in "per base sequence content" from base 1-15.

You don't have to trim these sequences, if fastqc does not report recognizes adapters. Just be aware, that the used primers do not lead to a completly random amplification.

ADD COMMENTlink written 4 months ago by caggtaagtat700

Thank you for your patient reply. I really learnt a lot !

ADD REPLYlink written 4 months ago by yujinlong00070310
2
gravatar for Ido Tamir
4 months ago by
Ido Tamir5.0k
Austria
Ido Tamir5.0k wrote:

The adapter starts at the 3' end of your reads, not the 5' (unless its an adapter dimer - i.e. no insert).

This is the result of random priming in RNA-Seq. I think Biases in Illumina transcriptome sequencing caused by random hexamer priming is the first paper on this. This represents real sequences. After alignment you can check the error rate in the reads, its only marginally higher 5' than in the rest of the read. And the fastqc report of the aligned sequences should show the same pattern.

ADD COMMENTlink written 4 months ago by Ido Tamir5.0k

I think you are right! Thank you very much.

ADD REPLYlink written 4 months ago by yujinlong00070310
1
gravatar for darbinator
4 months ago by
darbinator190
darbinator190 wrote:

Adaptors sequences are present in 3' on the read, so according to the chart there does not seem to be any adapter contamination The 10 overepresente bp on the 5' come from primers. Personally, I remove them if the initial quality of my reads is not very good, but if I have reads of good quality this is not a problem for alignment

ADD COMMENTlink written 4 months ago by darbinator190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1117 users visited in the last hour