Question: Selecting sequences from a multi-fasta with certain kmers
0
gravatar for michberr8
3.2 years ago by
michberr80
michberr80 wrote:

Hi,

I have a fasta file with about 30 9-mers. I also have some metagenomes in fasta format which are about 15 GB (~100 million reads, 125 bp) I would like to use my kmer file to filter out sequences from my metagenomes that only have a match to one of the kmers.

There's a lot of kmer counting software out there like jellyfish, tallymer, meryl, but as far as I can tell, none of these have the utility to select or filter sequences based on the presence of kmers.

Does anyone know of software that would do this efficiently?

Thanks

kmer sequence fasta • 1.0k views
ADD COMMENTlink written 3.2 years ago by michberr80
2

BBduk from BBMap should be able to do this.

ADD REPLYlink written 3.2 years ago by genomax70k

that only have a match to one of the kmers.

do you mean

ONE hit match => discard SAME kmer found twice => keep TWO different kmer => keep

?

ADD REPLYlink written 3.2 years ago by Pierre Lindenbaum122k

Sorry, i meant one or more matches to my set of 30 kmers

ADD REPLYlink written 3.2 years ago by michberr80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 437 users visited in the last hour