Entering edit mode
7.9 years ago
michberr8
•
0
Hi,
I have a fasta file with about 30 9-mers. I also have some metagenomes in fasta format which are about 15 GB (~100 million reads, 125 bp) I would like to use my kmer file to filter out sequences from my metagenomes that only have a match to one of the kmers.
There's a lot of kmer counting software out there like jellyfish, tallymer, meryl, but as far as I can tell, none of these have the utility to select or filter sequences based on the presence of kmers.
Does anyone know of software that would do this efficiently?
Thanks
BBduk from BBMap should be able to do this.
do you mean
ONE hit match => discard SAME kmer found twice => keep TWO different kmer => keep
?
Sorry, i meant one or more matches to my set of 30 kmers