Question: khmer-2.0 fails to recognise paired reads
gravatar for chacazoulou
19 months ago by
chacazoulou0 wrote:

Dear all, I'm transitioning from khmer-0.7.1 to khmer-2.0 and my pipepline (PE Illumina raw reads->Trimomatic->>> fails to recognise the reads as paired reads at the digital normalization step, as suggested by the following error message from in paired mode :

making countgraph
** ERROR: Unpaired reads when require_paired is set!
** Failed on interleaved_trimmed_WTCHG_322035_270_1.fastq:
** Exiting!

I went to the script but can't find where the problem apart from maybe this is_paired variable(?) from line 100 onwards.

i found some information ( regarding is_pair from the script and it seems that it requires the reads to be sorted /1 then /2, which they are in my files.

But I'm not sure if this is anyhow related with is_paired from, and I can't find any more information as to why might not recognize my input file as paired reads. Does anyone have any idea what I might be doing wrong (Im sure there's plenty), and what I could try to solve this issue?

Please let me know if I should be providing more/less information (or different information) ; this is my first on such forums. Many thanks Clement

ADD COMMENTlink modified 19 months ago • written 19 months ago by chacazoulou0

Perhaps khmer is expecting the PE reads to be in independent (R1/R2) files (are you using an interleaved reads file for khmer step)?

ADD REPLYlink modified 19 months ago • written 19 months ago by genomax49k

Hi, yes I use interleaved reads, but that's what's recommended for khmer ( , @ adaptors trimming step). Also, there's only one input file possible, so I don't believe that independent could even be specified.

ADD REPLYlink written 19 months ago by chacazoulou0

Is it possible that you have lost pairing of the reads in your trimmed data? Do you get the error right away or after sometime? You could use from BBMap to verify pairing of the reads like in=reads.fastq verifypairing

ADD REPLYlink written 19 months ago by genomax49k

thanks for the reply,

I get the error from the beginning of ; it doesn't seem to process any read. (the error meassage cited in my original post is the first piece of log that is prompted)

I doubt that the trimmed data contains orphaned reads or has headers in a format that isn't recognized by khmer, because the khmer-2.0 in-house scripts used upstream in the pipeline, check for pairing of the trimmed reads, and they work just fine :

... 0 pairs
... 100000 pairs
... 200000 pairs
... 300000 pairs
etc... until end of files

Whereas unpaired files give the following error message (well, paired but in obsolete format apparently): Interleaving:

... 0 pairs
ERROR: This doesn't look like paired data! K00150:134:HF5LHBBXX:8:1101:2027:1244/1 K00150:134:HF5LHBBXX:8:1101:2027:1244/2

In addition to that, I then run the script from khmer-2.0 which is specifiacally designed at seperating orphaned reads from paired in an interleaved reads file, and it tells me that 100% of my reads are paired /0% orphaned:

reading file "khmer-interleaved_270.fq"
outputting interleaved pairs to ""
outputting orphans to ""
... 100000
... 200000
... 300000
DONE; read 2062100 sequences, 1031050 pairs and 0 singletons

So, it would seem that my input files are indeed paired but not recognised as such by khmer-2.0 for some reason. Maybe, a relevant piece of information would be the format of my input files, so here it is :

dops0510@evo:/var/tmp/Clement/CCUV_mutants$ head
@K00150:134:HF5LHBBXX:8:1101:6197:1261 1:N:0:NTCAATCA
@K00150:134:HF5LHBBXX:8:1101:6197:1261 2:N:0:NTCAATCA

Thanks again

ADD REPLYlink modified 19 months ago by genomax49k • written 19 months ago by chacazoulou0
gravatar for chacazoulou
19 months ago by
chacazoulou0 wrote:

Alright, it's solved. Thanks again for your support genomax2 (and the reformating of my messages, couldn't get that right).

Basically, it was not an issue with the input files (which is probably why I was I puzzled, to say the least), but with the way that I was calling the script. I was calling it with system from within a perl script , and for reasons beyond my (poorly educated) understanding, calling it directly from the command line with the exact same parameters, worked fine! (the paths was correct to the script were also the same)

Cheers Clement

ADD COMMENTlink written 19 months ago by chacazoulou0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1262 users visited in the last hour