Why Trimmoatic assign paired sequences to unpaired after adapter trimming
1
0
Entering edit mode
6.5 years ago
ddzhangzz ▴ 90

I manually checked the output of the Trimmomatic and was confused that a paired seqs were assigned to unpaired in the output. Here is my Trimmomatic command line:

java -Xms8g -jar Trimmomatic.jar PE -threads 6 -phred33 sample1_R1.fastq.gz sample1_R2.fastq.gz sample1_forward_paired.fastq.gz sample1_forward_unpaired.fastq.gz sample1_reverse_paired.fastq.gz sample1_reverse_unpaired.fastq.gz ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 MINLEN:50

Then I manually checked one sequence in one of the output files: sample1_forward_unpaired.fastq.gz:

    @NB501800:50:H3NW5BGX3:1:11101:4253:1049 1:N:0:TTAGGC
CTCTTNATGACGCTTGTGGAATGTGTCGTTCACATTGTAAGTGATGTCATCAACAATGCACTGATCTCGAAGCTGCGAGTAGGCAATGCATGTCCATTCC
+
AAAAA#AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEAAAEEEEEEEEEEEEAEEEEEAEEEEEAEEEEA

Apparently, an adapter seq was trimmed from the original sequence. However, this sequence ID can be found in both raw sequence files, sample1_R1.fastq.gz and sample1_R2.fastq.gz.

sample1_R1.fastq.gz:

@NB501800:50:H3NW5BGX3:1:11101:4253:1049 1:N:0:TTAGGC
CTCTTNATGACGCTTGTGGAATGTGTCGTTCACATTGTAAGTGATGTCATCAACAATGCACTGATCTCGAAGCTGCGAGTAGGCAATGCATGTCCATTCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGC
+
AAAAA#AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEAAAEEEEEEEEEEEEAEEEEEAEEEEEAEEEEAEE/EAEEEEE/AEEEAAEEEEE/AAAAAEAEAEEEEAEAEE/<<<<EEEAA

sample1_R2.fastq.gz:

    @NB501800:50:H3NW5BGX3:1:11101:4253:1049 2:N:0:TTAGGC
GGAATGGACANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTGACATCACTTACAATGTGAACGACACATTCCACAAGCGTCATGAAGAGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCG
+
AAAAAEEEEE#########################################EE<EEEEEEEEEEEEEEEEAEEEEEEEAEEEEEEEEEEEEEEEEAEEEA/AEEEEAEEE/AEEEEEEEEAEAEEEEEEEEEEEE<EAAAAAAAAAEAEEA

By using MINLEN:50, I expected a sequence longer than 50bp retained even after trimming. My question is why the trimmed seq was assigned to sample1_forward_unpaired.fastq.gz rather than sample1_forward_paired.fastq.gz.

This sequence indeed disappeared in sample1_reverse_paried.fastq.gz, but why it's been eliminated entirely in sampe1_R2.fastq.gz. It is true that R2 was also contaminated with adapter but why it was removed other than kept like in the R1. If it was because the poor quality of "NNNN...", I still saw many remaining sequences containing these type of sequences.

RNA-Seq • 1.8k views
ADD COMMENT
0
Entering edit mode
6.5 years ago

The read is sent to unpaired because it became unpaired due to trimming, because its mate was eliminated. All the reads were paired before trimming.

ADD COMMENT
0
Entering edit mode

Thanks for your reply @Brian Bushnell. My sequence length is 151bp and I guess the length may be unlikely below 50bp even after adapter timmimg off. In what reasons do you think the sequence was eliminated?

ADD REPLY

Login before adding your answer.

Traffic: 1995 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6