Question: Why is FastQC not working after using Trim galore?
4
gravatar for beausoleilmo
2.9 years ago by
beausoleilmo230
McGill University
beausoleilmo230 wrote:

I have a FASTQ file and I'm able to run the FASTQC program to analyse the file. but when I use trim_galore, FASTQC (or the FASTQC option in trim_galore) is not working anymore.

$ fastqc ./sub1_val_1.fq.gz

This is the output:

Started analysis of sub1_val_1.fq.gz
Analysis complete for sub1_val_1.fq.gz
Failed to process file sub1_val_1.fq.gz
java.lang.ArrayIndexOutOfBoundsException: -1
    at uk.ac.babraham.FastQC.Modules.SequenceLengthDistribution.calculateDistribution(SequenceLengthDistribution.java:100)
    at uk.ac.babraham.FastQC.Modules.SequenceLengthDistribution.raisesError(SequenceLengthDistribution.java:184)
    at uk.ac.babraham.FastQC.Report.HTMLReportArchive.startDocument(HTMLReportArchive.java:336)
    at uk.ac.babraham.FastQC.Report.HTMLReportArchive.<init>(HTMLReportArchive.java:84)
    at uk.ac.babraham.FastQC.Analysis.OfflineRunner.analysisComplete(OfflineRunner.java:155)
    at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:110)
    at java.lang.Thread.run(Thread.java:695)

Is the Failed to process file an error because the version is not correct between trim_galore and FastQC?

I found this, but that wasn't that helpful.

I'm using FastQC v0.11.5 and trim_galore v0.4.1.

I subsetted a library (reads in paired-end) using this:

seqtk sample -s100 ./SRR2937435_1.fastq.gz 10000 | gzip  > sub1.fastq.gz
seqtk sample -s100 ./SRR2937435_2.fastq.gz 10000 | gzip > sub2.fastq.gz

The sub1_val_1.fq.gz file was after passing sub1.fastq.gz into trim_galore. FastQC with sub1.fastq.gz is working.

fastqc fastq • 2.7k views
ADD COMMENTlink modified 2.2 years ago by h.mon26k • written 2.9 years ago by beausoleilmo230

Is trim galore generating an error during run or is it completing without any errors?

ADD REPLYlink written 2.9 years ago by genomax68k

$ trim_galore --illumina --paired sub1.fastq.gz sub2.fastq.gz

No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default) 1.10
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to 'sub1.fastq.gz_trimming_report.txt'

SUMMARISING RUN PARAMETERS
==========================
Input filename: sub1.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.4.1
Cutadapt version: 1.10
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; user defined)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
Output file(s) will be GZIP compressed

Writing final adapter and quality trimmed output to sub1_trimmed.fq.gz

  >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file sub1.fastq.gz <<< 
This is cutadapt 1.10 with Python 2.7.10
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC sub1.fastq.gz
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 0.18 s (18 us/read; 3.28 M reads/minute).

=== Summary ===
Total reads processed:                  10,000
Reads with adapters:                     8,288 (82.9%)
Reads written (passing filters):        10,000 (100.0%)
Total basepairs processed:       940,000 bp
Quality-trimmed:                   2,658 bp (0.3%)
Total written (filtered):        680,222 bp (72.4%)

=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 8288 times.

No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1
Bases preceding removed adapters:
  A: 76.7%
  C: 18.0%
  G: 1.9%
  T: 3.4%
  none/other: 0.0%

Overview of removed sequences
length  count   expect  max.err error counts
1   819 2500.0  0   819
...blablabla...    
71  2   0.0 1   2

RUN STATISTICS FOR INPUT FILE: sub1.fastq.gz
10000 sequences processed in total
The length threshold of paired-end sequences gets evaluated later on (in the validation step)

Writing report to 'sub2.fastq.gz_trimming_report.txt'

SUMMARISING RUN PARAMETERS
Input filename: sub2.fastq.gz
...     
Writing final adapter and quality trimmed output to sub2_trimmed.fq.gz

  >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file sub2.fastq.gz <<< 
This is cutadapt 1.10 with Python 2.7.10
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC sub2.fastq.gz
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 0.17 s (17 us/read; 3.45 M reads/minute).

=== Summary ===
Total reads processed:                  10,000
Reads with adapters:                     8,302 (83.0%)
Reads written (passing filters):        10,000 (100.0%)
Total basepairs processed:       940,000 bp
Quality-trimmed:                   1,001 bp (0.1%)
Total written (filtered):        682,905 bp (72.6%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 8302 times.
No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1

Bases preceding removed adapters:
  A: 4.3%
  C: 3.3%
  G: 26.3%
  T: 66.1%
  none/other: 0.0%

Overview of removed sequences
length  count   expect  max.err error counts
1   796 2500.0  0   796
...blablabla...
69  1   0.0 1   1

RUN STATISTICS FOR INPUT FILE: sub2.fastq.gz
10000 sequences processed in total
The length threshold of paired-end sequences gets evaluated later on (in the validation step)

Validate paired-end files sub1_trimmed.fq.gz and sub2_trimmed.fq.gz
file_1: sub1_trimmed.fq.gz, file_2: sub2_trimmed.fq.gz

>>>>> Now validing the length of the 2 paired-end infiles: sub1_trimmed.fq.gz and sub2_trimmed.fq.gz <<<<<
zcat: can't stat: sub1_trimmed.fq.gz (sub1_trimmed.fq.gz.Z): No such file or directory
zcat: can't stat: sub2_trimmed.fq.gz (sub2_trimmed.fq.gz.Z): No such file or directory
Writing validated paired-end read 1 reads to sub1_val_1.fq.gz
Writing validated paired-end read 2 reads to sub2_val_2.fq.gz
Total number of sequences analysed: 0
Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 0 (N/A%)
Deleting both intermediate output files sub1_trimmed.fq.gz and sub2_trimmed.fq.gz
ADD REPLYlink written 2.9 years ago by beausoleilmo230

I admit that I have difficulty seeing if there is an error message. There are places where it seems that it's not finding the working directory...

ADD REPLYlink written 2.9 years ago by beausoleilmo230

Looks like it has trouble writing in that directory. It might be a permissions issue. Try running it in a new clean directory.

ADD REPLYlink written 2.9 years ago by igor7.7k

Do you mean that the file should be executable? Should I chmod it to 777?

ADD REPLYlink written 2.9 years ago by beausoleilmo230

Set a different output directory with -o output_dir option.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by genomax68k

That's a good option too.

The reason I suggested creating a new directory is it will definitely exist, it will definitely be empty, and it will probably be readable and writeable.

ADD REPLYlink written 2.9 years ago by igor7.7k

The file is already executable since you were able to execute it. I am worried about the other files and directories involved.

ADD REPLYlink written 2.9 years ago by igor7.7k

Can you check that the fastq looks reasonable:

zcat sub1_val_1.fq.gz | head

You can compare to the working file if you are not sure what to expect. It should not be too different.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by igor7.7k

I get an error. It's saying something about the working directory, but I double checked with ls, and it's really in the same directory.

zcat sub1_val_1.fq.gz | head
zcat: can't stat: sub1_val_1.fq.gz (sub1_val_1.fq.gz.Z): No such file or directory

ls
   sub1.fastq.gz_trimming_report.txt sub2.fastq.gz                     sub2_val_2.fq.gz
   sub1.fastq.gz                     sub1_val_1.fq.gz                  sub2.fastq.gz_trimming_report.txt
ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by beausoleilmo230
1

Pay close attention to the error message:

zcat: can't stat: sub1_val_1.fq.gz (sub1_val_1.fq.gz.Z): No such file or directory

Look at the .fq.gz.Z file extension. I think your problem is the same as the one described here and here.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by h.mon26k

Looks like the file is empty (only 20 bytes, and the report is like 3 KB).

ADD REPLYlink written 2.9 years ago by beausoleilmo230

For reference, link to SO post: http://stackoverflow.com/questions/38706402

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by zx87547.5k
1
gravatar for beausoleilmo
2.9 years ago by
beausoleilmo230
McGill University
beausoleilmo230 wrote:

I found the answer: You have to uncompress it. Probably, trim_galore is only working with tar.gz and not fastq.gz.

gzip -d -k sub1.fastq.gz > sub1.fastq
y # to accept to overwrite
gzip -d -k sub2.fastq.gz > sub2.fastq
y # to accept to overwrite

trim_galore  --illumina --paired --fastqc sub1.fastq sub2.fastq
ADD COMMENTlink written 2.9 years ago by beausoleilmo230
1

It sounds odd that fastq.gz is not accepted since the software page clearly says

Trim Galore! accepts and produces standard or gzip compressed FastQ files

But long as you were able to make it work :-)

ADD REPLYlink written 2.9 years ago by genomax68k

You don't have to uncompress it. I used it many times with compressed files.

I still think it's a permissions-related issue. When you pipe to a file, it should not ask to overwrite. The file should be silently overwritten.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by igor7.7k
3
gravatar for h.mon
2.2 years ago by
h.mon26k
Brazil
h.mon26k wrote:

Update you TrimGalore to at least 0.4.2:

07-09-16: Version 0.4.2 released

  • Replaced zcat with gunzip -c so that older versions of Mac OSX do not append a .Z to the end of the file and subsequently fail because the file is not present. Dah...
ADD COMMENTlink written 2.2 years ago by h.mon26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1495 users visited in the last hour