Question: High A in "Per base sequence content" of fastQC report
0
gravatar for DVA
20 months ago by
DVA530
United States
DVA530 wrote:

In my RNA sequencing fastqc report, I consistently notice an abnormally high A (green) peak in session "Per base sequence content". See images below. I worry it is caused by the kmers represented in different regions of the reads.

Anyone has seen this before? I would appreciate it if someone could help me diagnose this problem.Thank you.

enter image description here enter image description here enter image description here enter image description here

fastqc • 1.6k views
ADD COMMENTlink modified 20 months ago • written 20 months ago by DVA530

Do you have a lot of poly-A stretches in your reads?

ADD REPLYlink written 20 months ago by genomax74k

Thank you for your reply. I do not expect that, but I can check. Based on the report, "A" seems to show up in the position 10-30 bps. If it is caused by polyA tail wouldn't that be shown at the end?

ADD REPLYlink written 20 months ago by DVA530
2

Have you scanned/trimmed this data to see if you have contaminating sequences present that get trimmed? I suggest using bbduk.sh from BBMap suite. Something like:

bbduk.sh in1=reads_R1.fq in2=reads_R2.fq out1=clean_R1.fq out2=clean_R2.fq ref=adapters.fa ktrim=r k=23 mink=11 tbe tpo

File with adapter sequences (adapters.fa) is included in resources directory in BBMap distro.

ADD REPLYlink written 20 months ago by genomax74k

Not yet. I went to read the protocol of the library preparation and you might be right about it has polyA. Should I trim the whole first 30bps? Thank you so much for all the information.

ADD REPLYlink written 20 months ago by DVA530
1

Try trimming the data as I suggested above first. If you still have poly-A stretches left over afterwards then they can be trimmed with another run of bbduk.sh. It is the first 30+ bases that may be the good sequence here so you want to keep those for sure.

ADD REPLYlink written 20 months ago by genomax74k

This GC plot looks almost bimodal - are you expecting this (i.e. is this a mixed sample, e.g. plant-pathogen/other metagenomics)? Otherwise, maybe try some read classification (centrifuge, kraken with a transcriptome database, k-Slam, ...) and see what you've got in there?

ADD REPLYlink written 20 months ago by cschu1811.9k

No I do not expect this. It is human sample. I will look into the databases. Thank you.

ADD REPLYlink modified 20 months ago • written 20 months ago by DVA530

single cell RNA Seq

This in important information and should have been added to original post. Was this a particular kind of kit/technology? You should follow instructions that may be specific for post-processing data in that case.

ADD REPLYlink modified 20 months ago • written 20 months ago by genomax74k

Thank you for the reply. I am looking into it.

ADD REPLYlink written 20 months ago by DVA530

I'm so sorry. It is not single cell. Nonetheless I am going to try your method. Thank you.

ADD REPLYlink written 20 months ago by DVA530
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1770 users visited in the last hour