Question: Why Trimmoatic assign paired sequences to unpaired after adapter trimming
0
gravatar for ddzhangzz
9 weeks ago by
ddzhangzz40
United States
ddzhangzz40 wrote:

I manually checked the output of the Trimmomatic and was confused that a paired seqs were assigned to unpaired in the output. Here is my Trimmomatic command line:

java -Xms8g -jar Trimmomatic.jar PE -threads 6 -phred33 sample1_R1.fastq.gz sample1_R2.fastq.gz sample1_forward_paired.fastq.gz sample1_forward_unpaired.fastq.gz sample1_reverse_paired.fastq.gz sample1_reverse_unpaired.fastq.gz ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 MINLEN:50

Then I manually checked one sequence in one of the output files: sample1_forward_unpaired.fastq.gz:

    @NB501800:50:H3NW5BGX3:1:11101:4253:1049 1:N:0:TTAGGC
CTCTTNATGACGCTTGTGGAATGTGTCGTTCACATTGTAAGTGATGTCATCAACAATGCACTGATCTCGAAGCTGCGAGTAGGCAATGCATGTCCATTCC
+
AAAAA#AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEAAAEEEEEEEEEEEEAEEEEEAEEEEEAEEEEA

Apparently, an adapter seq was trimmed from the original sequence. However, this sequence ID can be found in both raw sequence files, sample1_R1.fastq.gz and sample1_R2.fastq.gz.

sample1_R1.fastq.gz:

@NB501800:50:H3NW5BGX3:1:11101:4253:1049 1:N:0:TTAGGC
CTCTTNATGACGCTTGTGGAATGTGTCGTTCACATTGTAAGTGATGTCATCAACAATGCACTGATCTCGAAGCTGCGAGTAGGCAATGCATGTCCATTCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGC
+
AAAAA#AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEAAAEEEEEEEEEEEEAEEEEEAEEEEEAEEEEAEE/EAEEEEE/AEEEAAEEEEE/AAAAAEAEAEEEEAEAEE/<<<<EEEAA

sample1_R2.fastq.gz:

    @NB501800:50:H3NW5BGX3:1:11101:4253:1049 2:N:0:TTAGGC
GGAATGGACANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTGACATCACTTACAATGTGAACGACACATTCCACAAGCGTCATGAAGAGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCG
+
AAAAAEEEEE#########################################EE<EEEEEEEEEEEEEEEEAEEEEEEEAEEEEEEEEEEEEEEEEAEEEA/AEEEEAEEE/AEEEEEEEEAEAEEEEEEEEEEEE<EAAAAAAAAAEAEEA

By using MINLEN:50, I expected a sequence longer than 50bp retained even after trimming. My question is why the trimmed seq was assigned to sample1_forward_unpaired.fastq.gz rather than sample1_forward_paired.fastq.gz.

This sequence indeed disappeared in sample1_reverse_paried.fastq.gz, but why it's been eliminated entirely in sampe1_R2.fastq.gz. It is true that R2 was also contaminated with adapter but why it was removed other than kept like in the R1. If it was because the poor quality of "NNNN...", I still saw many remaining sequences containing these type of sequences.

rna-seq • 209 views
ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by ddzhangzz40
0
gravatar for Brian Bushnell
9 weeks ago by
Walnut Creek, USA
Brian Bushnell15k wrote:

The read is sent to unpaired because it became unpaired due to trimming, because its mate was eliminated. All the reads were paired before trimming.

ADD COMMENTlink written 9 weeks ago by Brian Bushnell15k

Thanks for your reply @Brian Bushnell. My sequence length is 151bp and I guess the length may be unlikely below 50bp even after adapter timmimg off. In what reasons do you think the sequence was eliminated?

ADD REPLYlink written 9 weeks ago by ddzhangzz40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1387 users visited in the last hour