Working backwards - How do I retrieve transcripts from a primer list using sequence and amplicon length?
1
0
Entering edit mode
3.0 years ago
dacotahm ▴ 20

I'd like to search a transcriptome with a list forward and reverse primers and use the amplicon length from the primer pair to identify the transcript used to design the primers.

I'm trying to salvage someone else's project and find the transcripts associated with a primer set. The transcriptome has a lot of duplication in it (for reasons not relevant to my question), so each primer has multiple hits. Most of the hits are from the duplicate transcripts or some possibly from isoforms.

Using standalone blast (NCBI-BLAST+ in Ubuntu bash) I'm able to retrieve a hit list for all my primers:

blastn -task blastn-short -query PrimerList.fa -db Trin_duped.fasta -out PrimersXDB.txt -outfmt '10 qseqid sseqid qlen length nident evalue ssstart send qseq sseq slen' -max_target_seqs 20


The problem is that there are multiple hits for each F/R primer and each combination has a different amplicon size. The number of possible combinations is beyond manual filtering. How do I identify only transcripts that match a F/R primer combination with a specified amplicon size (or very near range in the event of software weirdness, i.e. +- 10bp)?

transcriptome primer blast alignment • 687 views
3
Entering edit mode
3.0 years ago
h.mon 33k

Map the primers pairs as paired-end reads with bowtie (use -S to get sam output) and use the TLEN field to get the amplicon with the correct size.

1
Entering edit mode

I found an additional solution is to fill the missing bases in the amplicon length with NNNNs and use "-task blast short" to align them as a single unit, but your solution works faster, thanks.