Pulling out SNPs from a BAM file compared to the reference based on a list of targeted SNPs
2
1
Entering edit mode
7.4 years ago
yarmda ▴ 40

I am trying to identify reads in a BAM file that deviate from the reference, be it SNP or indel, based on a list of targeted SNPs/indels I have.

Using BEDtools intersect or samtools view -L, I can identify reads that overlap these SNP/indel regions, but I haven't found how to check if the nucleotide matches the reference or matches the SNPs/indels in my list.

My SNPs and indels are tracks in CLC Bio and in a spreadsheet, so these can be formatted into whatever is needed pretty readily - .bed, .vcf, .gff, etc.

For example:

I have a SNP at position 82432 in my reference genome. The reference genome lists a 'T' here, but I want to know if any given read shows a 'C' in its place. How would I do that?

bam samtools bedtools SNP • 5.8k views
ADD COMMENT
1
Entering edit mode
7.4 years ago
igor 13k

Most variant callers allow you to restrict them to positions of interest based on a BED file.

There are some previous discussions about the many SNP callers available and how to use them:

ADD COMMENT
1
Entering edit mode
7.4 years ago

You could use samtools mpileup, and generate a pileup output. That will tell you what each read is at a given locus. It can take a .bed file of coordinates as input, and only output those loci.

ADD COMMENT
0
Entering edit mode

Something like

    samtools mpileup -l file.bed -b bamlist.txt

Right?

This is looking good, thanks!

ADD REPLY
1
Entering edit mode

You probably want to generate a VCF. Something like:

samtools mpileup -ugf <ref.fa> <sample1.bam> <sample2.bam> <sample3.bam> | bcftools call -vmO z -o <study.vcf.gz>

(from the example workflow: http://www.htslib.org/workflow/#mapping_to_variant )

ADD REPLY
0
Entering edit mode

This is great for identifying where SNPs are and what the nucleotide is at that position, but only if I survey it manually. Is there a tool that I can feed a file containing those positions and nucleotides of interest? Not every SNP is important to me and often, at SNP positions, only one alternate nucleotide is of interest. Do you know of a tool for that?

Also, thanks for the help!

ADD REPLY
0
Entering edit mode

You were already doing that when you gave samtools a bed file. You can still do that. I was just showing an example pipeline to get better formatted results.

ADD REPLY
0
Entering edit mode

Parse the big long line, count how many alternate letters are there.

ADD REPLY

Login before adding your answer.

Traffic: 2559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6