High A in "Per base sequence content" of fastQC report
3.1 years ago
DVA ▴ 550

In my RNA sequencing fastqc report, I consistently notice an abnormally high A (green) peak in session "Per base sequence content". See images below. I worry it is caused by the kmers represented in different regions of the reads.

Anyone has seen this before? I would appreciate it if someone could help me diagnose this problem.Thank you.

fastqc • 2.4k views
Do you have a lot of poly-A stretches in your reads?

Thank you for your reply. I do not expect that, but I can check. Based on the report, "A" seems to show up in the position 10-30 bps. If it is caused by polyA tail wouldn't that be shown at the end?

Have you scanned/trimmed this data to see if you have contaminating sequences present that get trimmed? I suggest using bbduk.sh from BBMap suite. Something like:

bbduk.sh in1=reads_R1.fq in2=reads_R2.fq out1=clean_R1.fq out2=clean_R2.fq ref=adapters.fa ktrim=r k=23 mink=11 tbe tpo


File with adapter sequences (adapters.fa) is included in resources directory in BBMap distro.

Not yet. I went to read the protocol of the library preparation and you might be right about it has polyA. Should I trim the whole first 30bps? Thank you so much for all the information.

Try trimming the data as I suggested above first. If you still have poly-A stretches left over afterwards then they can be trimmed with another run of bbduk.sh. It is the first 30+ bases that may be the good sequence here so you want to keep those for sure.

This GC plot looks almost bimodal - are you expecting this (i.e. is this a mixed sample, e.g. plant-pathogen/other metagenomics)? Otherwise, maybe try some read classification (centrifuge, kraken with a transcriptome database, k-Slam, ...) and see what you've got in there?

No I do not expect this. It is human sample. I will look into the databases. Thank you.

single cell RNA Seq

This in important information and should have been added to original post. Was this a particular kind of kit/technology? You should follow instructions that may be specific for post-processing data in that case.

Thank you for the reply. I am looking into it.

I'm so sorry. It is not single cell. Nonetheless I am going to try your method. Thank you.

