High A in "Per base sequence content" of fastQC report
0
0
Entering edit mode
3.1 years ago
DVA ▴ 550

In my RNA sequencing fastqc report, I consistently notice an abnormally high A (green) peak in session "Per base sequence content". See images below. I worry it is caused by the kmers represented in different regions of the reads.

Anyone has seen this before? I would appreciate it if someone could help me diagnose this problem.Thank you.

enter image description here enter image description here enter image description here enter image description here

fastqc • 2.4k views
ADD COMMENT
0
Entering edit mode

Do you have a lot of poly-A stretches in your reads?

ADD REPLY
0
Entering edit mode

Thank you for your reply. I do not expect that, but I can check. Based on the report, "A" seems to show up in the position 10-30 bps. If it is caused by polyA tail wouldn't that be shown at the end?

ADD REPLY
2
Entering edit mode

Have you scanned/trimmed this data to see if you have contaminating sequences present that get trimmed? I suggest using bbduk.sh from BBMap suite. Something like:

bbduk.sh in1=reads_R1.fq in2=reads_R2.fq out1=clean_R1.fq out2=clean_R2.fq ref=adapters.fa ktrim=r k=23 mink=11 tbe tpo

File with adapter sequences (adapters.fa) is included in resources directory in BBMap distro.

ADD REPLY
0
Entering edit mode

Not yet. I went to read the protocol of the library preparation and you might be right about it has polyA. Should I trim the whole first 30bps? Thank you so much for all the information.

ADD REPLY
1
Entering edit mode

Try trimming the data as I suggested above first. If you still have poly-A stretches left over afterwards then they can be trimmed with another run of bbduk.sh. It is the first 30+ bases that may be the good sequence here so you want to keep those for sure.

ADD REPLY
0
Entering edit mode

This GC plot looks almost bimodal - are you expecting this (i.e. is this a mixed sample, e.g. plant-pathogen/other metagenomics)? Otherwise, maybe try some read classification (centrifuge, kraken with a transcriptome database, k-Slam, ...) and see what you've got in there?

ADD REPLY
0
Entering edit mode

No I do not expect this. It is human sample. I will look into the databases. Thank you.

ADD REPLY
0
Entering edit mode

single cell RNA Seq

This in important information and should have been added to original post. Was this a particular kind of kit/technology? You should follow instructions that may be specific for post-processing data in that case.

ADD REPLY
0
Entering edit mode

Thank you for the reply. I am looking into it.

ADD REPLY
0
Entering edit mode

I'm so sorry. It is not single cell. Nonetheless I am going to try your method. Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 1288 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6