Question: Problems with Trimmomatic PE output files
1
gravatar for hinkel2
17 months ago by
hinkel210
hinkel210 wrote:

Hi all!

I downloaded some sra datasets using ascp as recommended (https://www.ncbi.nlm.nih.gov/books/NBK158899/) and additionally used fastq-dump on the downloaded sra-files

fastq-dump --gzip --split-files file.sra

and got two files (file_1.fastq.gz, file_2.fastq.gz), each 7.4 GB, as output.

First thing I did after this, was to check read quality with fastqc: file_1.fastq.gz file_2.fastq.gz

As you can see, the whiskers go down to ~15, so I wanted to discard the low quality reads, using trimmomatics. Actually, I was not sure whether the adapters were already removed from the reads or not, so I just added the standard ILLUMINACLIP and other options recommened to use:

java -jar trimmomatic-0.36.jar PE -phred33 ../file_1.fastq.gz ../file_2.fastq.gz ../file_1_clean.fastq.gz ../file_2_clean.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:18 MINLEN:50

Multiple cores found: Using 4 threads Input Read Pairs: 73338588 Both Surviving: 53742129 (73,28%) Forward Only Surviving: 4076266 (5,56%) Reverse Only Surviving: 7872077 (10,73%) Dropped: 7648116 (10,43%) TrimmomaticPE: Completed successfully

finally I got the "clean" reads: file_1_clean.fastq.gz (4.8 GB) file_2_clean.fastq.gz (0.4 GB)

I don't know why there is this huge difference. Is it possible, that the second read pair is that bad? After this, I checked quality of reads again with fastqc. file_1_clean.fastq.gz looks ok, I think but file_2_clean.fastq.gz looks really strange and not really "clean".

file_1_clean.fastq.gz file_2_clean.fastq.gz

Does anyone know what happend here?

Thanks in advance!

ADD COMMENTlink modified 17 months ago by genomax49k • written 17 months ago by hinkel210
4

you need to 4 output file, each input need to two out putfile. Trimmomatic will save filtered and unfiltered reads in separate files so your command must be as following.

java -jar trimmomatic-0.36.jar PE -phred33 ../file_1.fastq.gz ../file_2.fastq.gz ../file_1_clean.fastq.gz file_1_discarded.fastq.gz ../file_2_clean.fastq.gz file_2_discarded.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:18 MINLEN:50

in your command, Trimmomatic save discarded reads in file_2_clean.fastq.gz and because it the size of that file is low.

i hope my suggestion work for you

ADD REPLYlink written 17 months ago by reza190

Yes, I think you're right. I completely overlooked this in the manual. How stupid... I will test again and tell you!

Thanks!

ADD REPLYlink written 17 months ago by hinkel210
Both Surviving: 53742129 (73,28%)

That bit makes the size of the R2_clean file (0.4G) suspicious. Looks like the file may have got corrupted in the process. Have you tried to repeat the trimming?

ADD REPLYlink modified 17 months ago • written 17 months ago by genomax49k

I repeated the trimming also with others parameters (e.g. SLIDINGWINDOW:4:15 or SLIDINGWINDOW:4:20), but it looks similar.

ADD REPLYlink written 17 months ago by hinkel210

Actually, I was not sure whether the adapters were already removed from the reads or not

As a side note: FastQC tells you the presence of the adapters, since you're uploading FastQC screenshots you should see also the adapter content in the same output. Obviously this is true for the standard adapters, which are very often the ones used, but if you used a different one for some reason then you won't see it there. It shouldn't be the case though.

ADD REPLYlink written 17 months ago by Macspider2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1673 users visited in the last hour