Trimmomatic is removing excessive reads after trimmimg
2
0
Entering edit mode
4.7 years ago
sneha108ss ▴ 30

Hi, I have some RNA-seq datasets and I wanted to remove adapters and low quality bases before proceeding with my analysis. The final goal of my analysis is to perform differential gene expression between my control and experimental conditions. I used fastqc to check the quality of my sequences and it looks good. The links to the fastqc results for one of my samples is here:

per_base_sequence_quality_read1 adapter_content_read1 per_base_sequence_quality_read2 adapter_content_read2

I next removed adapters and low quality bases using a sliding window of 4:30 using trimmomatic, but after trimming I only have 87% of my reads remaining. Could someone please explain why trimmomatic is removing over 10% of my reads even though the quality of my raw sequences is pretty good?

This is the command for trimmomatic:

java -jar trimmomatic-0.39.jar PE -threads 32 -trimlog SF1-1-C-gill_trimlog.log -summary SF1-1-C-gill_summary.txt SF1-1-C-gill_R1.fastq.gz SF1-1-C-gill_R2.fastq.gz SF1-1-C-gill_r1paired.fastq.gz SF1-1-C-gill_r1unpaired.fastq.gz SF1-1-C-gill_r2paired.fastq.gz SF1-1-C-gill_r2unpaired.fastq.gz ILLUMINACLIP:20:30:15:8:true SLIDINGWINDOW:4:30

I am using trimmomatic version 0.39

RNA-Seq Trimmomatic Quality • 3.3k views
ADD COMMENT
0
Entering edit mode

The links to the figures are not working, I suggest using ImgBB. Your sliding window quality filter is pretty stringent, did you consider / test with a more lax filter?

ADD REPLY
0
Entering edit mode

Hi, Thank you for your reply! I have updated the links to the figures, hopefully you can see them now!

According to the fastqc results, the per base sequence quality is above 30 for read 1 and drops slightly below 30 only towards the end of read 2, which is why I decided to trim them at quality 30. I did try trimming them with the sliding window quality filter set to 20 and it kept close to 99% of the reads which is what I would expect, but since most of my bases have a quality above 30, I'm surprised that it removes over 10%.

ADD REPLY
0
Entering edit mode

The links are still not working. Please use a public image hoster such as Imgur, get the full path to the uploaded image including the suffix (e.g. .png) and then use the image embedding buttom (the one right of the 10101) to embed the link to the image.

ADD REPLY
0
Entering edit mode

This is what I get with your links:

404

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode
4.7 years ago
h.mon 35k

So far we haven't been able to see the FastQC figures, but since using a lower quality threshold keeps most of your reads, the issue is you are being too strict.

Quality trimming is not always advised, unless you have very bad data. It depends on many factors, as downstream analyes, sequencing depth / coverage, and so on. Are you performing transcriptome assembly (in which case some mild filtering could be beneficial, depending on adequate sequencing depth), or are you mapping and quantifying against a reference genome (in which case quality filtering is most probably detrimental, as you are throwing out information)?

ADD COMMENT
0
Entering edit mode
4.7 years ago
sneha108ss ▴ 30

I've added the images using Imgur, so it should be working now!

I am not performing transcriptome assembly as I have a reference genome against which I'll be mapping my reads. I do understand that filtering at quality 30 might be too strict, but based on the fastqc results, I still wouldn't expect it to remove over 10% of the reads.

Also I tried filtering at quality 30 but using the TRAILING option instead of the SLIDING WINDOW and it keeps 98 percent of the reads which is what I would except. So I don't understand why trimming at the same quality using the sliding window approach would remove more reads.

ADD COMMENT

Login before adding your answer.

Traffic: 2037 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6