Problems with Trimmomatic PE output files
0
1
Entering edit mode
7.3 years ago
hinkel2 ▴ 10

Hi all!

I downloaded some sra datasets using ascp as recommended (https://www.ncbi.nlm.nih.gov/books/NBK158899/) and additionally used fastq-dump on the downloaded sra-files

fastq-dump --gzip --split-files file.sra

and got two files (file_1.fastq.gz, file_2.fastq.gz), each 7.4 GB, as output.

First thing I did after this, was to check read quality with fastqc: file_1.fastq.gz file_2.fastq.gz

As you can see, the whiskers go down to ~15, so I wanted to discard the low quality reads, using trimmomatics. Actually, I was not sure whether the adapters were already removed from the reads or not, so I just added the standard ILLUMINACLIP and other options recommened to use:

java -jar trimmomatic-0.36.jar PE -phred33 ../file_1.fastq.gz ../file_2.fastq.gz ../file_1_clean.fastq.gz ../file_2_clean.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:18 MINLEN:50

Multiple cores found: Using 4 threads Input Read Pairs: 73338588 Both Surviving: 53742129 (73,28%) Forward Only Surviving: 4076266 (5,56%) Reverse Only Surviving: 7872077 (10,73%) Dropped: 7648116 (10,43%) TrimmomaticPE: Completed successfully

finally I got the "clean" reads: file_1_clean.fastq.gz (4.8 GB) file_2_clean.fastq.gz (0.4 GB)

I don't know why there is this huge difference. Is it possible, that the second read pair is that bad? After this, I checked quality of reads again with fastqc. file_1_clean.fastq.gz looks ok, I think but file_2_clean.fastq.gz looks really strange and not really "clean".

file_1_clean.fastq.gz file_2_clean.fastq.gz

Does anyone know what happend here?

Thanks in advance!

RNA-Seq Trimmomatic fastqc adapter • 5.6k views
ADD COMMENT
4
Entering edit mode

you need to 4 output file, each input need to two out putfile. Trimmomatic will save filtered and unfiltered reads in separate files so your command must be as following.

java -jar trimmomatic-0.36.jar PE -phred33 ../file_1.fastq.gz ../file_2.fastq.gz ../file_1_clean.fastq.gz file_1_discarded.fastq.gz ../file_2_clean.fastq.gz file_2_discarded.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:18 MINLEN:50

in your command, Trimmomatic save discarded reads in file_2_clean.fastq.gz and because it the size of that file is low.

i hope my suggestion work for you

ADD REPLY
0
Entering edit mode

Yes, I think you're right. I completely overlooked this in the manual. How stupid... I will test again and tell you!

Thanks!

ADD REPLY
0
Entering edit mode
Both Surviving: 53742129 (73,28%)

That bit makes the size of the R2_clean file (0.4G) suspicious. Looks like the file may have got corrupted in the process. Have you tried to repeat the trimming?

ADD REPLY
0
Entering edit mode

I repeated the trimming also with others parameters (e.g. SLIDINGWINDOW:4:15 or SLIDINGWINDOW:4:20), but it looks similar.

ADD REPLY
0
Entering edit mode

Actually, I was not sure whether the adapters were already removed from the reads or not

As a side note: FastQC tells you the presence of the adapters, since you're uploading FastQC screenshots you should see also the adapter content in the same output. Obviously this is true for the standard adapters, which are very often the ones used, but if you used a different one for some reason then you won't see it there. It shouldn't be the case though.

ADD REPLY

Login before adding your answer.

Traffic: 1663 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6