4.5 years ago by
Santiago de Compostela, Spain
so what you want is to end up with just a single fasta sequence from a bam file, and then try to find your primers in it? if that is the case, you "only" need to convert the bam file into a fasta sequence and then perform some simple pattern matching.
first, get the variants out of the bam file and index them:
samtools mpileup -uf hg19.fa in.bam \
| bcftools call -vm -Oz - > out.vcf.gz
tabix -p vcf out.vcf.gz
then, extract the reference section you're interested in, build the consensus applying the variants, join all the fasta sequence lines into a single one per each fasta entry (for later searching purposes), and compress it in case it's expected to be a big file:
samtools faidx hg19.fa chr22:29000000-29200000 \
| vcf-consensus out.vcf.gz \
| perl -pe '/^>/ ? print "\n" : chomp' \
| tail -n +2 \
| bgzip > out.fasta.gz
finally, count the number of lines that match your primers:
zgrep -cf primers.txt out.fasta.gz
I used regions chr22:29000000-29200000 for this example, but any other region/s can be selected. have in mind that if you don't specify any region/s, the samtools faidx command will output the entire genome.