Question

Fastqc report analysis

0

Entering edit mode

13 months ago

npavliukovec • 0

Hello, I am a student and I dot a home work, where I have to analyze the data. So,I have to do FASTQC to all samples, after this I have to remove adapters, trim low-quality bases as well as remove reads that are shorter than 20 bp and compare the results. And I got quite strange results. To remove illumina adapters and remove reads, which are shorter than 20bp, I used this command

trim_galore --length 20 --illumina --fastqc filename.fastq.gz

But, the results are looking the same for me, so and I don't know if it is my fault or it is the data problem.

This is the link to google disk folder, where are fastqc reports (non trimmed and trimmed)

I just could understand, why I got worse data after trimming and how I could improve it

trim_galore fastqc • 1.2k views

ADD COMMENT • link updated 10 months ago by Ram 43k • written 13 months ago by npavliukovec • 0

score 2 · Answer 1 · 2023-03-21

Hi there,

Cool that you learn to do these things and welcome to the bioinformatic community.

In my opinion, both of the reports look to have good quality and could be used for downstream processing. It also seems that there was not really any adaptor sequence before you did the trimming. So in case you want to use it for RNA-seq analysis it is not necessary to do any trimming. Since Trimm Galore automatically recognizes the adaptors, you do not need to specify Illumina. In case you are using trim-galore for RNA-seq analysis, I would recommend setting the stringency parameter to 3. This makes sure that at least 3 base pairs of adapter must be present to cut. Otherwise, trim-galore will cut even if only one bp is present (however, this high stringency can be useful for bisulfite seq), and you mind end up with some small reads.

In case you did paired-end sequencing, you need to specify whether you have paired sequencing files (Read1 and Read2). Here is an example of how I do the trimming for RNA-seq data

trim_galore --paired file_R1.fastq.gz file_R2.fastq.gz -q 20 --fastqc  --stringency 3

I guess you say the trimmed report is worse off because the length distribution is more heterogeneous. This is because for some reads, trim galore found sequences that might have been adapter sequences or natural sequences that look like adaptor sequences and trimmed them to smaller reads. That's why you need to use a lower stringency (--stringency 3). However, it is still "good quality" and you can proceed.

See here for more details: https://github.com/FelixKrueger/TrimGalore/blob/master/Docs/Trim_Galore_User_Guide.md