Question: How should I handle the raw reads with failed per base sequence content in fastQC
1
gravatar for yujinlong000703
15 months ago by
yujinlong00070310 wrote:

Hey guys: I am doing RNA-seq analysis and it seems that the quality of my reads is not desirable.
Below is a typical fastqc report for my data.
I have read many tutorial about fastqc, from my understanding, it seems that the 1-10 bp are adaptor sequences. But in the adaptor content section, there is no waining.


  • I am wandering if my understanding is right?
  • Should I use trimmomatic to cut adaptor sequences?

Failed per base sequence content

fastqc rna-seq • 1.0k views
ADD COMMENTlink modified 15 months ago by Ido Tamir5.1k • written 15 months ago by yujinlong00070310
1

All you need to know is in this blog post by FastQC authors.

Don't do anything specific is the take home for this particular observation. Data should be fine.

ADD REPLYlink written 15 months ago by genomax85k

Thank you for your help, now I think my problem is solved !

ADD REPLYlink written 15 months ago by yujinlong00070310
2
gravatar for caggtaagtat
15 months ago by
caggtaagtat1.1k
caggtaagtat1.1k wrote:

Hi,

This is totally normal for RNA-sequencing data even after removal of the adapter sequences. The random hexamer primers, which are used to generate the cDNA library from your RNA transcripts were shown to not bind completly random. This non-random binding leads to this bias in "per base sequence content" from base 1-15.

You don't have to trim these sequences, if fastqc does not report recognizes adapters. Just be aware, that the used primers do not lead to a completly random amplification.

ADD COMMENTlink written 15 months ago by caggtaagtat1.1k

Thank you for your patient reply. I really learnt a lot !

ADD REPLYlink written 15 months ago by yujinlong00070310
2
gravatar for Ido Tamir
15 months ago by
Ido Tamir5.1k
Austria
Ido Tamir5.1k wrote:

The adapter starts at the 3' end of your reads, not the 5' (unless its an adapter dimer - i.e. no insert).

This is the result of random priming in RNA-Seq. I think Biases in Illumina transcriptome sequencing caused by random hexamer priming is the first paper on this. This represents real sequences. After alignment you can check the error rate in the reads, its only marginally higher 5' than in the rest of the read. And the fastqc report of the aligned sequences should show the same pattern.

ADD COMMENTlink written 15 months ago by Ido Tamir5.1k

I think you are right! Thank you very much.

ADD REPLYlink written 15 months ago by yujinlong00070310
1
gravatar for darbinator
15 months ago by
darbinator220
darbinator220 wrote:

Adaptors sequences are present in 3' on the read, so according to the chart there does not seem to be any adapter contamination The 10 overepresente bp on the 5' come from primers. Personally, I remove them if the initial quality of my reads is not very good, but if I have reads of good quality this is not a problem for alignment

ADD COMMENTlink written 15 months ago by darbinator220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1925 users visited in the last hour