Question

Problems with Trimmomatic PE output files

1

Entering edit mode

7.3 years ago

hinkel2 ▴ 10

Hi all!

I downloaded some sra datasets using ascp as recommended (https://www.ncbi.nlm.nih.gov/books/NBK158899/) and additionally used fastq-dump on the downloaded sra-files

fastq-dump --gzip --split-files file.sra

and got two files (file_1.fastq.gz, file_2.fastq.gz), each 7.4 GB, as output.

First thing I did after this, was to check read quality with fastqc: file_1.fastq.gz file_2.fastq.gz

As you can see, the whiskers go down to ~15, so I wanted to discard the low quality reads, using trimmomatics. Actually, I was not sure whether the adapters were already removed from the reads or not, so I just added the standard ILLUMINACLIP and other options recommened to use:

java -jar trimmomatic-0.36.jar PE -phred33 ../file_1.fastq.gz ../file_2.fastq.gz ../file_1_clean.fastq.gz ../file_2_clean.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:18 MINLEN:50

Multiple cores found: Using 4 threads Input Read Pairs: 73338588 Both Surviving: 53742129 (73,28%) Forward Only Surviving: 4076266 (5,56%) Reverse Only Surviving: 7872077 (10,73%) Dropped: 7648116 (10,43%) TrimmomaticPE: Completed successfully

finally I got the "clean" reads: file_1_clean.fastq.gz (4.8 GB) file_2_clean.fastq.gz (0.4 GB)

I don't know why there is this huge difference. Is it possible, that the second read pair is that bad? After this, I checked quality of reads again with fastqc. file_1_clean.fastq.gz looks ok, I think but file_2_clean.fastq.gz looks really strange and not really "clean".

file_1_clean.fastq.gz file_2_clean.fastq.gz

Does anyone know what happend here?

Thanks in advance!

RNA-Seq Trimmomatic fastqc adapter • 5.6k views

ADD COMMENT • link updated 7.3 years ago by GenoMax 141k • written 7.3 years ago by hinkel2 ▴ 10

4

Entering edit mode

you need to 4 output file, each input need to two out putfile. Trimmomatic will save filtered and unfiltered reads in separate files so your command must be as following.

java -jar trimmomatic-0.36.jar PE -phred33 ../file_1.fastq.gz ../file_2.fastq.gz ../file_1_clean.fastq.gz file_1_discarded.fastq.gz ../file_2_clean.fastq.gz file_2_discarded.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:18 MINLEN:50

in your command, Trimmomatic save discarded reads in file_2_clean.fastq.gz and because it the size of that file is low.

i hope my suggestion work for you

ADD REPLY • link 7.3 years ago by reza ▴ 300

0

Entering edit mode

Yes, I think you're right. I completely overlooked this in the manual. How stupid... I will test again and tell you!

Thanks!

ADD REPLY • link 7.3 years ago by hinkel2 ▴ 10

0

Entering edit mode

Both Surviving: 53742129 (73,28%)

That bit makes the size of the R2_clean file (0.4G) suspicious. Looks like the file may have got corrupted in the process. Have you tried to repeat the trimming?

ADD REPLY • link 7.3 years ago by GenoMax 141k

0

Entering edit mode

I repeated the trimming also with others parameters (e.g. SLIDINGWINDOW:4:15 or SLIDINGWINDOW:4:20), but it looks similar.

ADD REPLY • link 7.3 years ago by hinkel2 ▴ 10

0

Entering edit mode

Actually, I was not sure whether the adapters were already removed from the reads or not

As a side note: FastQC tells you the presence of the adapters, since you're uploading FastQC screenshots you should see also the adapter content in the same output. Obviously this is true for the standard adapters, which are very often the ones used, but if you used a different one for some reason then you won't see it there. It shouldn't be the case though.

ADD REPLY • link 7.3 years ago by Matteo Schiavinato ★ 3.6k