I have a .bed file containing probes for a commercial microarray and would like to map these against a reference genome in NCBI, for which the probes were not originally designed. I'd like to see how many map with near perfect identity; if it's the majority, we'd go ahead and use it. Can anyone suggest a good strategy for doing this?
Use bedtools getfasta to extract the sequences of the probes from the genome it was designed (or contact the microarray manufacturer and ask for the probes in fasta format).
bedtools getfasta -fi oldGenome.fasta -bed probes.bed > probes.fasta
Then use bowtie or bowtie2 with
--end-to-end --very-fast to map the probes into the new genome.
You can also use bbmap to map, the option
mhist=mhist.txt idhist=idhist.txt would be helpful in checking how good the probes match the "new" genome.
bowtie2 -f -a --very-sensitive -x newGenome -U probes.fasta > probes.sam
You can also get a quick and dirty histogram by looking at the CIGAR string from the alignment file:
samtools view probes.bam | cut -f 6 | sort -n | uniq -c 1 144M 3 145M 1 145M5S 1 145M6S 2 146M 2 146M5S 5 147M 1 147M4S 51 148M 160 149M 695 150M 1480 151M