I am trying to search for the presence of couple sequences (around 400) each with a size of 23 bps,in different fastq files, while allowing 1-2 mismatches at maximum. I am not sure if turning the fastq to a genome(transcriptome) would be a nice approach? I have tried making the fastq -> fasta -> building blast database -> running blastn, however it did not run as my query is not only one sequence.
Example part of my query.file :
The output I am aiming to have is, for each read in my query.file, which of these sequences are having 100% (or having 1-2 mismatches) match in fastq file, and possibly where in the fastq file.
I would appreciate your suggestions! Thank you!