Question: High A in "Per base sequence content" of fastQC report
0
gravatar for DVA
2.6 years ago by
DVA550
United States
DVA550 wrote:

In my RNA sequencing fastqc report, I consistently notice an abnormally high A (green) peak in session "Per base sequence content". See images below. I worry it is caused by the kmers represented in different regions of the reads.

Anyone has seen this before? I would appreciate it if someone could help me diagnose this problem.Thank you.

enter image description here enter image description here enter image description here enter image description here

fastqc • 2.1k views
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by DVA550

Do you have a lot of poly-A stretches in your reads?

ADD REPLYlink written 2.6 years ago by genomax91k

Thank you for your reply. I do not expect that, but I can check. Based on the report, "A" seems to show up in the position 10-30 bps. If it is caused by polyA tail wouldn't that be shown at the end?

ADD REPLYlink written 2.6 years ago by DVA550
2

Have you scanned/trimmed this data to see if you have contaminating sequences present that get trimmed? I suggest using bbduk.sh from BBMap suite. Something like:

bbduk.sh in1=reads_R1.fq in2=reads_R2.fq out1=clean_R1.fq out2=clean_R2.fq ref=adapters.fa ktrim=r k=23 mink=11 tbe tpo

File with adapter sequences (adapters.fa) is included in resources directory in BBMap distro.

ADD REPLYlink written 2.6 years ago by genomax91k

Not yet. I went to read the protocol of the library preparation and you might be right about it has polyA. Should I trim the whole first 30bps? Thank you so much for all the information.

ADD REPLYlink written 2.6 years ago by DVA550
1

Try trimming the data as I suggested above first. If you still have poly-A stretches left over afterwards then they can be trimmed with another run of bbduk.sh. It is the first 30+ bases that may be the good sequence here so you want to keep those for sure.

ADD REPLYlink written 2.6 years ago by genomax91k

This GC plot looks almost bimodal - are you expecting this (i.e. is this a mixed sample, e.g. plant-pathogen/other metagenomics)? Otherwise, maybe try some read classification (centrifuge, kraken with a transcriptome database, k-Slam, ...) and see what you've got in there?

ADD REPLYlink written 2.6 years ago by cschu1812.5k

No I do not expect this. It is human sample. I will look into the databases. Thank you.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by DVA550

single cell RNA Seq

This in important information and should have been added to original post. Was this a particular kind of kit/technology? You should follow instructions that may be specific for post-processing data in that case.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by genomax91k

Thank you for the reply. I am looking into it.

ADD REPLYlink written 2.6 years ago by DVA550

I'm so sorry. It is not single cell. Nonetheless I am going to try your method. Thank you.

ADD REPLYlink written 2.6 years ago by DVA550
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1216 users visited in the last hour