Find all perfect matches for short sequences to human reference genome
1
2
Entering edit mode
4.7 years ago
Mick ▴ 30

Hi guys,

I'm trying to get something done that sound super simple but I'm stuck and I don't know how to go about it.

I have a large excel file with Primer sequences that someone designed for the human genome and I need to find out which part of the genome they cover. All i have is the sequence for the forward and reverse primer. So ideally I'd like to align the forward and reverse primer to the genome, I only want perfect matches to the genome because the primers have been designed in such a way.

There should be a linux tool to do this right? Maybe Blast, or bowtie? If anyone could guide me in the right direction I would much appreciate that.

Thanks :)

alignment • 1.4k views
ADD COMMENT
1
Entering edit mode

bbmap has the following modes that might be of use...

perfectmode=f           Allow only perfect mappings when set to true (very fast).
semiperfectmode=f       Allow only perfect and semiperfect (perfect except for N's in the reference) mappings.
ADD REPLY
0
Entering edit mode

bowtie is perfectly fine. I would set all mismatch and penalty parameters to 1000, set seed mismatches to zero etc. to ensure only perfect matches are returned.

ADD REPLY
0
Entering edit mode

Thank you for the quick reply, I'll let you know how it works :)

ADD REPLY
0
Entering edit mode

For bowtie I think one could use -n 0 -k 1 -m 1 --best --strata -v0. For bowtie2 I used in the past (but this was for NGS perfect matches not short primers) --end-to-end -N 0 --mp 10000 --np 10000 --rdg 10000 --rfg 10000. Try things out a bit :)

ADD REPLY
0
Entering edit mode

Depending on how many sequences there are just use in silico PCR tool: http://genome.ucsc.edu/cgi-bin/hgPcr?db=hg38.
You can also use the web interface for blat for a more flexible search. http://genome.ucsc.edu/cgi-bin/hgBlat?command=start

ADD REPLY
1
Entering edit mode
4.7 years ago
Mick ▴ 30

Thank you guys so much for all the replys. I tried bowtie and used these parameters:

bowtie hg19 -v 0 -a -X 800 -r1 "sequences1.txt" -r2 "sequences2.txt"

It worked really well for the most part. It only missed a couple of primers, where it didn't find any alignment. I manually searched for the location of these primers with blast. They were perfect matches to the genome, I don't really know why bowtie may have missed those. But anyhow, thank you guys so much for the help. :)

ADD COMMENT

Login before adding your answer.

Traffic: 2629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6