Poly-A trimming for WGBS
1
0
Entering edit mode
8 months ago

We did WGBS for our fish samples and I am currently trimming our raw reads using the fastp (Version 0.20.0) and TrimGalore! (Version 0.6.10) software. however, when I did quality checking of the trimmed sequences using FastQC (version 0.12.1) I noticed that my reads both in raw sequences and trimmed sequences (trimmed using fastp and TrimGalore!), specifically on [filename]_2 sequences (reverse sequences), have a high poly-A content. I already included the "--trim_poly_x" in fastp but it did not do any significant decrease when I did FastQC.

Edit: I attached the fastqc report, specifically for the reverse sequences, for the before (first picture) and after (second picture) trimming.

I hope someone could enlighten me regarding this

report for adapter content of one of my sequences specifically the reverse sequence before I use fastp

Here is the report for adapter content of one of my sequences specifically the reverse sequence after I trimmed them using fastp. I am getting this kind of report whether I am using fastp or TrimGalore.

Trimgalore FastP Poly-A • 1.5k views
ADD COMMENT
0
Entering edit mode

What is "high"? How many percent of reads are affected? Show the relevant fastqc report.

ADD REPLY
0
Entering edit mode

hi thanks, I've attached the adapter sequence report of my fastqc to my original post

ADD REPLY
0
Entering edit mode

I have the same exact issue! Please let me know if you have a solution.

ADD REPLY
0
Entering edit mode
6 months ago

I don't know if fastp is failing or if it defines poly-A differently than fastqc does, but you can try BBDuk (from the BBTools package) instead:

bbduk.sh in=r1.fq in2=r2.fq out=trimmed1.fq out2=trimmed2.fq trimpolya=6
ADD COMMENT
0
Entering edit mode

I think Trim-Galore! trims the most abundant adapters in the data, therefore it only detects Illumina universal adapters, do you think also it is an issue for WGBS data as the T and A would be the most abundant bases across the reads due to bisulfite treatment? also, do you think clipping the last few bases from each read could help solve that issue? or you recommend using bbduk?

ADD REPLY
0
Entering edit mode

Bisulfite would not affect the adapter sequence since they are added later. As for whether clipping the last few bases would help, or what exactly needs to be trimmed, that's hard to say unless you post your fastqc report and/or ACGT frequency histogram (bhist flag from BBDuk).

ADD REPLY
0
Entering edit mode

I have posted the qc report pre and after trimming in another comment (excuse my mistake). The first screenshot is the pre trimming and the second is after trimming. All samples have polyA detected adapters, and all of them are the R2 reads which did not pass the test. However all reads passed most of the other tests including overrepresented sequences.

ADD REPLY
0
Entering edit mode

Without a legend, I'm not sure what all those lines are. Also, please add such posts as replies, not answers.

ADD REPLY
0
Entering edit mode

The diagram shows the adapter content before trimming across various samples. Each line in the diagram represents the adapter content for a specific sample. The blue lines indicate the Illumina adapters in the R1 and R2 samples, while the orange line represents the polyA content in one of the R1 samples. The remaining lines, colored red and light blue, correspond to the polyA adapter content in all R2 samples.

The diagram shows the adapter content before trimming across various samples. Each line in the diagram represents the adapter content for a specific sample. The blue lines indicate the Illumina adapters in the R1 and R2 samples, while the orange line represents the polyA content in one of the R1 samples. The remaining lines, colored red and light blue, correspond to the polyA adapters content in all R2 samples

ADD REPLY
0
Entering edit mode

enter image description here

This is the FastQC report depicting adapter content after it has been trimmed using Trim-Galore! Every line in this report represents the polyA adapter content. The orange-red line at the bottom illustrates the polyA content for one of the R1 samples. All other lines in the report correspond to the polyA content in all the R2 samples.

ADD REPLY

Login before adding your answer.

Traffic: 1305 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6