Question: More specifics on Jellyfish -C methodology.
gravatar for dlawre14
2.2 years ago by
United States
dlawre1430 wrote:

I know in Jellyfish -C stands for canonical kmers, however I'm a little iffy on how this is implemented. Does Jellyfish take into account whether the reads are paired-end or not? I'm working on my own kmer software to use internally and want the the results to be equivalent to what jellyfish would spit out.

So far, my understanding is that that -C does not take into account which strand a read came from, but rather creates the reverse complement of any kmer it sees automatically and then classifies both a kmer and its reverse complement as the same kmer.

jellyfish kmer • 620 views
ADD COMMENTlink modified 2.2 years ago by Rob3.3k • written 2.2 years ago by dlawre1430
gravatar for Rob
2.2 years ago by
United States
Rob3.3k wrote:

There is no special accounting for paired end reads in Jellyfish (or any kmer counting software of which I'm aware). The -C option just means that when Jellyfish looks at a kmer k, it considers both k and rc(k). It associates k and rc(k) with whichever of the two is alphabetically smaller. This means that, e.g. If k is the smaller of the two, the count in the output table for k will be counting both occurrences of k and rc(k), while if rc(k) is the smaller of the two, then the output table wil contain only rc(k), but its count will be that of both rc(k) and k. This also means that no special rules are considered for stranded protocols. The Jellyfish software processes each kmer independently, and precisely what kmers it considers depends on the -C option etc.

ADD COMMENTlink written 2.2 years ago by Rob3.3k

Ok I think I understand most of this, but let me get a specific example, suppose we have ATG occurring 3 times and CAT occurring 2 times, what does this output as in jellyfish, is it CAT> 5 or ATG> 5

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by dlawre1430

According to the jellyfish manual ( "whichever comes first lexicographically". So, in your case, it would be ATG> 5.

ADD REPLYlink written 2.2 years ago by Rob3.3k

Ok so it's actually simple, they just count both and then group together the kmer and rc(kmer) and select lexicographically first as the "name" for that set. Thanks for all the help!

ADD REPLYlink written 2.2 years ago by dlawre1430
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1630 users visited in the last hour