Question: cutadapt error while performing Trim_Galore
1
gravatar for maria.traka
15 months ago by
maria.traka20
maria.traka20 wrote:

Hi, I've been trying to run trim_galore for 115 libraries and although 109 of them work perfectly for 7 of them i get the following message that Line 1 does not start with a @, which it does! I have gone back to the previous step and regenerated the file but still get the same error message. Can you help?

SUMMARISING RUN PARAMETERS
==========================
Input filename: /tgac/workarea/collaborators/traka/ESCAPE/Step2_merged/LIB27930_non_rRNA_unmerged1.fastq
Trimming mode: paired-end
Trim Galore version: 0.4.2
Cutadapt version: 1.10
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 5 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 60 bp
Running FastQC on the data once trimming has completed

Writing final adapter and quality trimmed output to LIB27930_non_rRNA_unmerged1_trimmed.fq


  >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /tgac/workarea/collaborators/traka/ESCAPE/Step2_merged/LIB27930_non_rRNA_unmerged1.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
This is cutadapt 1.10 with Python 2.7.9
Command line parameters: -f fastq -e 0.1 -q 20 -O 5 -a AGATCGGAAGAGC /tgac/workarea/collaborators/traka/ESCAPE/Step2_merged/LIB27930_non_rRNA_unmerged1.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
cutadapt: error: Line 1 in FASTQ file is expected to start with '@', but found '\n'


Cutadapt terminated with exit signal: '256'.
Terminating Trim Galore run, please check error message(s) to get an idea what went wrong...
cutadapt trim_galore • 1.1k views
ADD COMMENTlink modified 15 months ago • written 15 months ago by maria.traka20
head -n 1 LIB27930_non_rRNA_unmerged1.fastq | sed -n 'l'
@HISEQ:171:CAUJ2ANXX:8:1101:1212:2104 1:N:0:TGTATCGGCCGG$

This is what i get if i look at line 1...

ADD REPLYlink written 15 months ago by maria.traka20

This is actually happening further down in the file. What's wc -l LIB27930_non_rRNA_unmerged1.fastq?

ADD REPLYlink written 15 months ago by Devon Ryan84k

I get: 175654078 LIB27930_non_rRNA_unmerged1.fastq

ADD REPLYlink written 15 months ago by maria.traka20

I think line 1 is just the file it is currently reading.... it clearly processes the first 40 million sequences. Sounds like you'll have to do some file extraction work in the shell to find the line causing the error. I would make a test file of the first 40 million sequences to see if it completes. Also, you should be able to predict from the length of the file how many lines should begin with "@"... then count out the amount of lines that actually begin with "@".

ADD REPLYlink written 15 months ago by YaGalbi1.3k
1
gravatar for maria.traka
15 months ago by
maria.traka20
maria.traka20 wrote:

Update: Thanks to all your help above I now have a good idea of where the errors are. Thanks! After a bit more digging it seems that the latest sortmerna (v2.1) that generated these files has a bug that introduces errors. Until this is fixed the suggestion is to increase the memory allocation. I am now trying that and hope it will fix the problem...

ADD COMMENTlink written 15 months ago by maria.traka20
0
gravatar for dariober
15 months ago by
dariober9.4k
Glasgow - UK
dariober9.4k wrote:

FASTQ file is expected to start with '@', but found '\n'

If this is true It seems you have an empty line in your fastq. Try to locate it with:

grep -A 10 -B 10 -n '^$' myreads.fastq

This could help finding out what happened

ADD COMMENTlink written 15 months ago by dariober9.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 731 users visited in the last hour