in silico enrichment
1
0
Entering edit mode
9.3 years ago
kamhamea • 0

I'm looking for a fast tool to scan hight throughput sequencing data reads for patterns, so that the final thorough analysis can be performed with only a few reads that belong to the cluster.

Practicaly, I manually generated a set of oligos (~20mer) that uniquely belong to a cluster of genes, and now I'm going to find all the reads that are matching. Next step is finding neighboring reads, but already the fist step programmed based on a python find regex string routine takes weeks on whole genome seq.

sequencing alignment • 1.2k views
ADD COMMENT
1
Entering edit mode
9.3 years ago

This sounds like a job for BBDuk, which can filter reads by matching kmers. It's extremely fast. For example, using 20-mers:

bbduk.sh in=reads.fq outm=matching.fq ref=oligos.fa k=20 mm=f

Note that if your oligos contain degenerate IUPAC symbols like "N" you should add the flag "copyundefined".

ADD COMMENT

Login before adding your answer.

Traffic: 3340 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6