Question: Empty output trimmomatic
0
gravatar for luigi.marongiu
3.0 years ago by
United Kingdom
luigi.marongiu0 wrote:

Dear all,
I am applying trimmomatric to trim fastaq files by quality and to remove the adapters. I have two paired files seq1.1.fq and seq1.2.fq with nextera adapters so I ran the following command:
java -jar trimmomatic-0.33.jar PE -threads 16 -phred64 seq1.1.fq seq1.2.fq pairedOutup1 pairedOutup2 unpairedOutup1 unpairedOutup2 ILLUMINACLIP:NexteraPE-PE.fa:2:30:10:1:true LEADING:5 TRAILING:5 SLIDINGWINDOW:4:15 MINLEN:36

The command is executed with the following display:
Using PrefixPair: 'AGATGTGTATAAGAGACAG' and 'AGATGTGTATAAGAGACAG'
Using Long Clipping Sequence: 'GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG'
Using Long Clipping Sequence: 'TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG'
Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTGACGCTGCCGACGA'
Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTCCGAGCCCACGAGAC'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Read Pairs: 947710 Both Surviving: 0 (0.00%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 947710 (100.00%)
TrimmomaticPE: Completed successfully

However all the output files are completely empty.

The first lines of the input are:
{seq1.1.fq}
@M03595:11:000000000-AG58B:1:1101:16029:1738 1:N:0:26
ATTGTTAATCGTAAAGCAATGTTCATTCCGATTGTGGCTGTTGCAAGTTTTATGCTTGTAGGTTATGCTGCAACCGATAAAGAAATGCCGGAAATTAGATCTAATCAAATTGAAGTTC
+
1>A1AF33DD1AA113B11B1GGE3FGEF00EEAG20AFCGH1A111FGGH2G21FHBGB21FG1F11F1101BB//E//1@100D1@B/////BG111BBG11FE111BG111BFGE
@M03595:11:000000000-AG58B:1:1101:14217:1754 1:N:0:26
GTTGGCCATAAGGCTGTTGGTGCGATAGTTAATAATGTGATGGTTCCGATCGATACAAAATTAAATACGGGTGATGTCGTAGAAATCAAGACAAATAAACAGTCACAG
+
1AA11@11C1111BF1GG11A0100A00DF22D22D2D22D21BD1B//B///A/A1110BG111F2A///>//FBFFAFA//21BB1111>000B111@10BF1@11
@M03595:11:000000000-AG58B:1:1101:13810:1764 1:N:0:26
GTTGAGACTGTGGATGGTATCAGCGGGTATTGCATGAGTGAGTTTATAAAACTCTGTTAG
+
...

{seq1.2.fq}
@M03595:11:000000000-AG58B:1:1101:16029:1738 2:N:0:26
TTACTTCAATTTGTTTATTTCTAATTTCCGGCATTTCTTTATCGGTTGCAGCATAACCTACAAGCATATAACTTGCAACAGCCACAATCGGAATGAACATTGCTTTACGATTAACAAT
+
111>>D@31BDF33BB333BAB33DFG3A00A0AFGDGGH2FEA0BE/01110B1111D1A111/0D1222BDG1111B000>0B0/B1////B@11@1BF11GHHFE//FG?1@11B
@M03595:11:000000000-AG58B:1:1101:14217:1754 2:N:0:26
CTGTGACTGTTTCTTTGTCTTGATTTCTTCTACTTCACCCGTATTTAATTTTGTTTCTATCGGTTCCTTCACATTATTAACTATCGCTCCAACTGCCTTATGGCCAAC
+
1>1>13BB1FDF3BBG3BAFG13DFGAF333333D331AA0B0BFG22DDGH2B0B222DA/////12DA1A2DF1DG22AFDGE//0A>11100@0BD1B11/01/>
@M03595:11:000000000-AG58B:1:1101:13810:1764 2:N:0:26
CTAACAGAGTTTTATATTCTCACTCATGCAATACCCGCTGATACCATCCACATTCTCAAC
+

What might be the issue? maybe the quality is so low that all the sequences are removed?
Thank you.

output trimmomatic • 1.5k views
ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by luigi.marongiu0
1
gravatar for h.mon
3.0 years ago by
h.mon20k
Brazil
h.mon20k wrote:

I guess your data quality is encoded with PHRED+33, and you are passing a flag (-phred64) telling Trimmomatic it is encoded on PHRED+64.

ADD COMMENTlink written 3.0 years ago by h.mon20k

That's a good tip, I thought 64 was more recent but I was wrong... Tx

ADD REPLYlink written 3.0 years ago by luigi.marongiu0

The Wikipedia page on the Fastq format is a good read.

ADD REPLYlink written 3.0 years ago by h.mon20k

I also came across the same problem, it was indeed the case. Thanks.

ADD REPLYlink written 11 months ago by gongjing.rss0
0
gravatar for luigi.marongiu
3.0 years ago by
United Kingdom
luigi.marongiu0 wrote:

Hello,

I ran fastqc on the input files and got the following:

{seq1.1.fq}

PASS    Basic Statistics   
PASS    Per base sequence quality   
PASS    Per tile sequence quality   
PASS    Per sequence quality scores   
FAIL    Per base sequence content   
WARN    Per sequence GC content   
PASS    Per base N content   
WARN    Sequence Length Distribution   
FAIL    Sequence Duplication Levels   
WARN    Overrepresented sequences   
PASS    Adapter Content   
FAIL    Kmer Content   

{seq.1.2.fq}

PASS    Basic Statistics   
PASS    Per base sequence quality   
PASS    Per tile sequence quality  
PASS    Per sequence quality scores   
FAIL    Per base sequence content   
WARN    Per sequence GC content   
PASS    Per base N content  
WARN    Sequence Length Distribution   
FAIL    Sequence Duplication Levels   
WARN    Overrepresented sequences   
PASS    Adapter Content   
FAIL    Kmer Content   

I could actually run the command omitting the SLIDINGWINDOW option (!). Look from the fastqc analysis that the adapters were removed already, so trimmomatic simply omitted such step and went to the trimming for quality step, is that assumption correct or I should not run trimming for adapters on sequences already adapter-cleaned?

Thank you

ADD COMMENTlink written 3.0 years ago by luigi.marongiu0

FasqQC is not the most sensitive tool for finding adapters. You should check Trimmomatic output carefully, it will report the percentage of reads with adapters. Illumina basecalling may clean adapters automatically, but I've found it will leave some significant leftovers.

On another note, to keep the forum tidy you should open new questions instead of asking here, this area is for answers only. Follow up like this one could have been asked on the "comments" above.

ADD REPLYlink written 2.9 years ago by h.mon20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1782 users visited in the last hour