Dear all, I'm transitioning from khmer-0.7.1 to khmer-2.0 and my pipepline (PE Illumina raw reads->Trimomatic->interleave-reads.py->extract-paired-reads.py->normalize-by-mean.py) fails to recognise the reads as paired reads at the digital normalization step, as suggested by the following error message from normalize-by-median.py in paired mode :
making countgraph
** ERROR: Unpaired reads when require_paired is set!
** Failed on interleaved_trimmed_WTCHG_322035_270_1.fastq:
** Exiting!
I went to the script but can't find where the problem apart from maybe this is_paired variable(?) from line 100 onwards.
i found some information (http://lists.idyll.org/pipermail/khmer/2015-December/000766.html) regarding is_pair from the interleave-reads.py script and it seems that it requires the reads to be sorted /1 then /2, which they are in my files.
But I'm not sure if this is anyhow related with is_paired from normalize-by-median.py, and I can't find any more information as to why normalize-by-median.py might not recognize my input file as paired reads. Does anyone have any idea what I might be doing wrong (Im sure there's plenty), and what I could try to solve this issue?
Please let me know if I should be providing more/less information (or different information) ; this is my first on such forums. Many thanks Clement
Perhaps khmer is expecting the PE reads to be in independent (R1/R2) files (are you using an interleaved reads file for khmer step)?
Hi, yes I use interleaved reads, but that's what's recommended for khmer (http://khmer-protocols.readthedocs.io/en/v0.8.2/mrnaseq/1-quality.html , @ adaptors trimming step). Also, there's only one input file possible, so I don't believe that independent could even be specified.
Is it possible that you have lost pairing of the reads in your trimmed data? Do you get the error right away or after sometime? You could use
reformat.sh
from BBMap to verify pairing of the reads likereformat.sh in=reads.fastq verifypairing
thanks for the reply,
I get the error from the beginning of normalize-by-mean.py ; it doesn't seem to process any read. (the error meassage cited in my original post is the first piece of log that is prompted)
I doubt that the trimmed data contains orphaned reads or has headers in a format that isn't recognized by khmer, because the khmer-2.0 in-house scripts used upstream in the pipeline, check for pairing of the trimmed reads, and they work just fine :
Whereas unpaired files give the following error message (well, paired but in obsolete format apparently): Interleaving:
In addition to that, I then run the extract-paired-reads.py script from khmer-2.0 which is specifiacally designed at seperating orphaned reads from paired in an interleaved reads file, and it tells me that 100% of my reads are paired /0% orphaned:
So, it would seem that my input files are indeed paired but not recognised as such by khmer-2.0 normalize-by-median.py for some reason. Maybe, a relevant piece of information would be the format of my input files, so here it is :
Thanks again