Question: Why Trimmoatic assign paired sequences to unpaired after adapter trimming
0
gravatar for ddzhangzz
9 months ago by
ddzhangzz80
United States
ddzhangzz80 wrote:

I manually checked the output of the Trimmomatic and was confused that a paired seqs were assigned to unpaired in the output. Here is my Trimmomatic command line:

java -Xms8g -jar Trimmomatic.jar PE -threads 6 -phred33 sample1_R1.fastq.gz sample1_R2.fastq.gz sample1_forward_paired.fastq.gz sample1_forward_unpaired.fastq.gz sample1_reverse_paired.fastq.gz sample1_reverse_unpaired.fastq.gz ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 MINLEN:50

Then I manually checked one sequence in one of the output files: sample1_forward_unpaired.fastq.gz:

    @NB501800:50:H3NW5BGX3:1:11101:4253:1049 1:N:0:TTAGGC
CTCTTNATGACGCTTGTGGAATGTGTCGTTCACATTGTAAGTGATGTCATCAACAATGCACTGATCTCGAAGCTGCGAGTAGGCAATGCATGTCCATTCC
+
AAAAA#AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEAAAEEEEEEEEEEEEAEEEEEAEEEEEAEEEEA

Apparently, an adapter seq was trimmed from the original sequence. However, this sequence ID can be found in both raw sequence files, sample1_R1.fastq.gz and sample1_R2.fastq.gz.

sample1_R1.fastq.gz:

@NB501800:50:H3NW5BGX3:1:11101:4253:1049 1:N:0:TTAGGC
CTCTTNATGACGCTTGTGGAATGTGTCGTTCACATTGTAAGTGATGTCATCAACAATGCACTGATCTCGAAGCTGCGAGTAGGCAATGCATGTCCATTCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGC
+
AAAAA#AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEAAAEEEEEEEEEEEEAEEEEEAEEEEEAEEEEAEE/EAEEEEE/AEEEAAEEEEE/AAAAAEAEAEEEEAEAEE/<<<<EEEAA

sample1_R2.fastq.gz:

    @NB501800:50:H3NW5BGX3:1:11101:4253:1049 2:N:0:TTAGGC
GGAATGGACANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTGACATCACTTACAATGTGAACGACACATTCCACAAGCGTCATGAAGAGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCG
+
AAAAAEEEEE#########################################EE<EEEEEEEEEEEEEEEEAEEEEEEEAEEEEEEEEEEEEEEEEAEEEA/AEEEEAEEE/AEEEEEEEEAEAEEEEEEEEEEEE<EAAAAAAAAAEAEEA

By using MINLEN:50, I expected a sequence longer than 50bp retained even after trimming. My question is why the trimmed seq was assigned to sample1_forward_unpaired.fastq.gz rather than sample1_forward_paired.fastq.gz.

This sequence indeed disappeared in sample1_reverse_paried.fastq.gz, but why it's been eliminated entirely in sampe1_R2.fastq.gz. It is true that R2 was also contaminated with adapter but why it was removed other than kept like in the R1. If it was because the poor quality of "NNNN...", I still saw many remaining sequences containing these type of sequences.

rna-seq • 385 views
ADD COMMENTlink modified 9 months ago • written 9 months ago by ddzhangzz80
0
gravatar for Brian Bushnell
9 months ago by
Walnut Creek, USA
Brian Bushnell15k wrote:

The read is sent to unpaired because it became unpaired due to trimming, because its mate was eliminated. All the reads were paired before trimming.

ADD COMMENTlink written 9 months ago by Brian Bushnell15k

Thanks for your reply @Brian Bushnell. My sequence length is 151bp and I guess the length may be unlikely below 50bp even after adapter timmimg off. In what reasons do you think the sequence was eliminated?

ADD REPLYlink written 9 months ago by ddzhangzz80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 992 users visited in the last hour