Question: Build Graph lost k-mers
0
gravatar for lardo
2.2 years ago by
lardo0
lardo0 wrote:


I have a FASTA file contains unique k-mers with the read length is the k-mer size:

>kmer_1
ATGACAGCCTTTTTTAAA
>kmer_2
ATGACAGCCTTTTTTAAT

Then I used the API of gatb-core-1.0.6-Linux :

 Graph::create ((char const *)"-in %s -kmer-size %d -abundance-min 1 -nb-cores %d -out %s", xxxx);

There should be 11,571,887 unique k-mer in my file, but the graph build by this file  contains only 11,065,132 unique k-mers.

I think this program lost some useful k-mers while storing k-mers.

 

gatb • 717 views
ADD COMMENTlink modified 2.2 years ago by Brian Bushnell10k • written 2.2 years ago by lardo0
0
gravatar for edrezen
2.2 years ago by
edrezen670
France
edrezen670 wrote:

Hello,

It is possible that you have N characters inside your data. In such a case, no valid kmer can be built, so the read having one or several N won't be used.

Could you check this is the case of your input ?

ADD COMMENTlink written 2.2 years ago by edrezen670

No 'N' character in the reads.The lost k-mers can be found in my file but not contained in the graph.

ADD REPLYlink written 2.2 years ago by lardo0
0
gravatar for Brian Bushnell
2.2 years ago by
Walnut Creek, USA
Brian Bushnell10k wrote:

Perhaps the program is storing reverse-complements in a canonical fashion, so they are only represented once.  That's fairly typical.

You can count kmers with BBTools and, using the 'rcomp' flag, enable or disable storing of kmers and their reverse-complements independently, to get the count each way:

kmercountexact.sh in=file.fasta k=18 rcomp=t

kmercountexact.sh in=file.fasta k=18 rcomp=f

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Brian Bushnell10k

Yes GATB does collapse each k-mer and its reverse complement into a single canonical kmer.

ADD REPLYlink written 2.2 years ago by Rayan Chikhi1.1k

If this answer does not solve your problem, would you mind posting the dataset? (I'm assuming this is a ~200 MB file, possibly much less if gzipped)

ADD REPLYlink written 2.2 years ago by Rayan Chikhi1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1490 users visited in the last hour