I believe the code below currently answers the question. I am not sure if it is technically or biologically sane. I don't know how you got these patterns and how exactly making primers from these is going to help you solve your assembly, but that's not my job ;-)
- Reads are searched in a fastq.gz [
--fastq] file with an exact match to (a) given pattern(s) [
- If pattern is found a sequence of length 22 is extracted (adjustable parameter in the script)
- All those sequences which are found are counted and sorted, reporting the 20 most frequent strings (adjustable parameter in the script). Results are printed to stdout.
What does not happen:
- Looking at reverse complement or strand
- Allowing nucleotide mismatches
- Incorporating degenerate nucleotides
This script was only limited tested on a small dataset. I don't know about the size of your dataset and the RAM you have available. You will need python and biopython. Save the code as
pattern2primer.py or something like that and execute as e.g.
python pattern2primer.py --fastq myreads.fastq.gz --pattern ATCGAGAAG
Let me know if you have additional specifications.