I'd like to search a transcriptome with a list forward and reverse primers and use the amplicon length from the primer pair to identify the transcript used to design the primers.
I'm trying to salvage someone else's project and find the transcripts associated with a primer set. The transcriptome has a lot of duplication in it (for reasons not relevant to my question), so each primer has multiple hits. Most of the hits are from the duplicate transcripts or some possibly from isoforms.
Using standalone blast (NCBI-BLAST+ in Ubuntu bash) I'm able to retrieve a hit list for all my primers:
blastn -task blastn-short -query PrimerList.fa -db Trin_duped.fasta -out PrimersXDB.txt -outfmt '10 qseqid sseqid qlen length nident evalue ssstart send qseq sseq slen' -max_target_seqs 20
The problem is that there are multiple hits for each F/R primer and each combination has a different amplicon size. The number of possible combinations is beyond manual filtering. How do I identify only transcripts that match a F/R primer combination with a specified amplicon size (or very near range in the event of software weirdness, i.e. +- 10bp)?
I found an additional solution is to fill the missing bases in the amplicon length with NNNNs and use "-task blast short" to align them as a single unit, but your solution works faster, thanks.