Question

Why is FastQC not working after using Trim galore?

4

Entering edit mode

7.7 years ago

beausoleilmo ▴ 580

I have a FASTQ file and I'm able to run the FASTQC program to analyse the file. but when I use trim_galore, FASTQC (or the FASTQC option in trim_galore) is not working anymore.

$ fastqc ./sub1_val_1.fq.gz

This is the output:

Started analysis of sub1_val_1.fq.gz
Analysis complete for sub1_val_1.fq.gz
Failed to process file sub1_val_1.fq.gz
java.lang.ArrayIndexOutOfBoundsException: -1
    at uk.ac.babraham.FastQC.Modules.SequenceLengthDistribution.calculateDistribution(SequenceLengthDistribution.java:100)
    at uk.ac.babraham.FastQC.Modules.SequenceLengthDistribution.raisesError(SequenceLengthDistribution.java:184)
    at uk.ac.babraham.FastQC.Report.HTMLReportArchive.startDocument(HTMLReportArchive.java:336)
    at uk.ac.babraham.FastQC.Report.HTMLReportArchive.<init>(HTMLReportArchive.java:84)
    at uk.ac.babraham.FastQC.Analysis.OfflineRunner.analysisComplete(OfflineRunner.java:155)
    at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:110)
    at java.lang.Thread.run(Thread.java:695)

Is the Failed to process file an error because the version is not correct between trim_galore and FastQC?

I found this, but that wasn't that helpful.

I'm using FastQC v0.11.5 and trim_galore v0.4.1.

I subsetted a library (reads in paired-end) using this:

seqtk sample -s100 ./SRR2937435_1.fastq.gz 10000 | gzip  > sub1.fastq.gz
seqtk sample -s100 ./SRR2937435_2.fastq.gz 10000 | gzip > sub2.fastq.gz

The sub1_val_1.fq.gz file was after passing sub1.fastq.gz into trim_galore. FastQC with sub1.fastq.gz is working.

fastq FASTQC Fastqc • 7.6k views

ADD COMMENT • link updated 7.0 years ago by h.mon 35k • written 7.7 years ago by beausoleilmo ▴ 580

0

Entering edit mode

Is trim galore generating an error during run or is it completing without any errors?

ADD REPLY • link 7.7 years ago by GenoMax 141k

0

Entering edit mode

$ trim_galore --illumina --paired sub1.fastq.gz sub2.fastq.gz

No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default) 1.10
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to 'sub1.fastq.gz_trimming_report.txt'

SUMMARISING RUN PARAMETERS
==========================
Input filename: sub1.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.4.1
Cutadapt version: 1.10
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; user defined)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
Output file(s) will be GZIP compressed

Writing final adapter and quality trimmed output to sub1_trimmed.fq.gz

  >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file sub1.fastq.gz <<< 
This is cutadapt 1.10 with Python 2.7.10
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC sub1.fastq.gz
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 0.18 s (18 us/read; 3.28 M reads/minute).

=== Summary ===
Total reads processed:                  10,000
Reads with adapters:                     8,288 (82.9%)
Reads written (passing filters):        10,000 (100.0%)
Total basepairs processed:       940,000 bp
Quality-trimmed:                   2,658 bp (0.3%)
Total written (filtered):        680,222 bp (72.4%)

=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 8288 times.

No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1
Bases preceding removed adapters:
  A: 76.7%
  C: 18.0%
  G: 1.9%
  T: 3.4%
  none/other: 0.0%

Overview of removed sequences
length  count   expect  max.err error counts
1   819 2500.0  0   819
...blablabla...    
71  2   0.0 1   2

RUN STATISTICS FOR INPUT FILE: sub1.fastq.gz
10000 sequences processed in total
The length threshold of paired-end sequences gets evaluated later on (in the validation step)

Writing report to 'sub2.fastq.gz_trimming_report.txt'

SUMMARISING RUN PARAMETERS
Input filename: sub2.fastq.gz
...     
Writing final adapter and quality trimmed output to sub2_trimmed.fq.gz

  >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file sub2.fastq.gz <<< 
This is cutadapt 1.10 with Python 2.7.10
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC sub2.fastq.gz
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 0.17 s (17 us/read; 3.45 M reads/minute).

=== Summary ===
Total reads processed:                  10,000
Reads with adapters:                     8,302 (83.0%)
Reads written (passing filters):        10,000 (100.0%)
Total basepairs processed:       940,000 bp
Quality-trimmed:                   1,001 bp (0.1%)
Total written (filtered):        682,905 bp (72.6%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 8302 times.
No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1

Bases preceding removed adapters:
  A: 4.3%
  C: 3.3%
  G: 26.3%
  T: 66.1%
  none/other: 0.0%

Overview of removed sequences
length  count   expect  max.err error counts
1   796 2500.0  0   796
...blablabla...
69  1   0.0 1   1

RUN STATISTICS FOR INPUT FILE: sub2.fastq.gz
10000 sequences processed in total
The length threshold of paired-end sequences gets evaluated later on (in the validation step)

Validate paired-end files sub1_trimmed.fq.gz and sub2_trimmed.fq.gz
file_1: sub1_trimmed.fq.gz, file_2: sub2_trimmed.fq.gz

>>>>> Now validing the length of the 2 paired-end infiles: sub1_trimmed.fq.gz and sub2_trimmed.fq.gz <<<<<
zcat: can't stat: sub1_trimmed.fq.gz (sub1_trimmed.fq.gz.Z): No such file or directory
zcat: can't stat: sub2_trimmed.fq.gz (sub2_trimmed.fq.gz.Z): No such file or directory
Writing validated paired-end read 1 reads to sub1_val_1.fq.gz
Writing validated paired-end read 2 reads to sub2_val_2.fq.gz
Total number of sequences analysed: 0
Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 0 (N/A%)
Deleting both intermediate output files sub1_trimmed.fq.gz and sub2_trimmed.fq.gz

ADD REPLY • link 7.7 years ago by beausoleilmo ▴ 580

0

Entering edit mode

I admit that I have difficulty seeing if there is an error message. There are places where it seems that it's not finding the working directory...

ADD REPLY • link 7.7 years ago by beausoleilmo ▴ 580

0

Entering edit mode

Looks like it has trouble writing in that directory. It might be a permissions issue. Try running it in a new clean directory.

ADD REPLY • link 7.7 years ago by igor 13k

0

Entering edit mode

Do you mean that the file should be executable? Should I chmod it to 777?

ADD REPLY • link 7.7 years ago by beausoleilmo ▴ 580

0

Entering edit mode

Set a different output directory with -o output_dir option.

ADD REPLY • link 7.7 years ago by GenoMax 141k

0

Entering edit mode

That's a good option too.

The reason I suggested creating a new directory is it will definitely exist, it will definitely be empty, and it will probably be readable and writeable.

ADD REPLY • link 7.7 years ago by igor 13k

0

Entering edit mode

The file is already executable since you were able to execute it. I am worried about the other files and directories involved.

ADD REPLY • link 7.7 years ago by igor 13k

0

Entering edit mode

Can you check that the fastq looks reasonable:

zcat sub1_val_1.fq.gz | head

You can compare to the working file if you are not sure what to expect. It should not be too different.

ADD REPLY • link 7.7 years ago by igor 13k

0

Entering edit mode

I get an error. It's saying something about the working directory, but I double checked with ls, and it's really in the same directory.

zcat sub1_val_1.fq.gz | head
zcat: can't stat: sub1_val_1.fq.gz (sub1_val_1.fq.gz.Z): No such file or directory

ls
   sub1.fastq.gz_trimming_report.txt sub2.fastq.gz                     sub2_val_2.fq.gz
   sub1.fastq.gz                     sub1_val_1.fq.gz                  sub2.fastq.gz_trimming_report.txt

ADD REPLY • link 7.7 years ago by beausoleilmo ▴ 580

1

Entering edit mode

Pay close attention to the error message:

zcat: can't stat: sub1_val_1.fq.gz (sub1_val_1.fq.gz.Z): No such file or directory

Look at the .fq.gz.Z file extension. I think your problem is the same as the one described here and here.

ADD REPLY • link 7.0 years ago by h.mon 35k

0

Entering edit mode

Looks like the file is empty (only 20 bytes, and the report is like 3 KB).

ADD REPLY • link 7.7 years ago by beausoleilmo ▴ 580

0

Entering edit mode

For reference, link to SO post: http://stackoverflow.com/questions/38706402

ADD REPLY • link 7.7 years ago by zx8754 11k

score 5 · Accepted Answer · 2017-04-09

5

Entering edit mode

7.0 years ago

h.mon 35k

Update you TrimGalore to at least 0.4.2:

07-09-16: Version 0.4.2 released

Replaced zcat with gunzip -c so that older versions of Mac OSX do not append a .Z to the end of the file and subsequently fail because the file is not present. Dah...

ADD COMMENT • link 7.0 years ago by h.mon 35k

score 1 · Accepted Answer · 2016-08-04

1

Entering edit mode

7.7 years ago

beausoleilmo ▴ 580

I found the answer: You have to uncompress it. Probably, trim_galore is only working with tar.gz and not fastq.gz.

gzip -d -k sub1.fastq.gz > sub1.fastq
y # to accept to overwrite
gzip -d -k sub2.fastq.gz > sub2.fastq
y # to accept to overwrite

trim_galore  --illumina --paired --fastqc sub1.fastq sub2.fastq

ADD COMMENT • link 7.7 years ago by beausoleilmo ▴ 580

1

Entering edit mode

It sounds odd that fastq.gz is not accepted since the software page clearly says

Trim Galore! accepts and produces standard or gzip compressed FastQ files

But long as you were able to make it work :-)

ADD REPLY • link 7.7 years ago by GenoMax 141k

1

Entering edit mode

You don't have to uncompress it. I used it many times with compressed files.

I still think it's a permissions-related issue. When you pipe to a file, it should not ask to overwrite. The file should be silently overwritten.

ADD REPLY • link 7.7 years ago by igor 13k