Question: Pulling out SNPs from a BAM file compared to the reference based on a list of targeted SNPs
0
gravatar for yarmda
2.6 years ago by
yarmda0
yarmda0 wrote:

I am trying to identify reads in a BAM file that deviate from the reference, be it SNP or indel, based on a list of targeted SNPs/indels I have.

Using BEDtools intersect or samtools view -L, I can identify reads that overlap these SNP/indel regions, but I haven't found how to check if the nucleotide matches the reference or matches the SNPs/indels in my list.

My SNPs and indels are tracks in CLC Bio and in a spreadsheet, so these can be formatted into whatever is needed pretty readily - .bed, .vcf, .gff, etc.

For example:

I have a SNP at position 82432 in my reference genome. The reference genome lists a 'T' here, but I want to know if any given read shows a 'C' in its place. How would I do that?

snp samtools bam bedtools • 1.7k views
ADD COMMENTlink modified 18 months ago by Biostar ♦♦ 20 • written 2.6 years ago by yarmda0
1
gravatar for igor
2.6 years ago by
igor8.0k
United States
igor8.0k wrote:

Most variant callers allow you to restrict them to positions of interest based on a BED file.

There are some previous discussions about the many SNP callers available and how to use them:

ADD COMMENTlink written 2.6 years ago by igor8.0k
1
gravatar for swbarnes2
2.6 years ago by
swbarnes26.0k
United States
swbarnes26.0k wrote:

You could use samtools mpileup, and generate a pileup output. That will tell you what each read is at a given locus. It can take a .bed file of coordinates as input, and only output those loci.

ADD COMMENTlink written 2.6 years ago by swbarnes26.0k

Something like

    samtools mpileup -l file.bed -b bamlist.txt

Right?

This is looking good, thanks!

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by yarmda0
1

You probably want to generate a VCF. Something like:

samtools mpileup -ugf <ref.fa> <sample1.bam> <sample2.bam> <sample3.bam> | bcftools call -vmO z -o <study.vcf.gz>

(from the example workflow: http://www.htslib.org/workflow/#mapping_to_variant )

ADD REPLYlink written 2.6 years ago by igor8.0k

This is great for identifying where SNPs are and what the nucleotide is at that position, but only if I survey it manually. Is there a tool that I can feed a file containing those positions and nucleotides of interest? Not every SNP is important to me and often, at SNP positions, only one alternate nucleotide is of interest. Do you know of a tool for that?

Also, thanks for the help!

ADD REPLYlink written 2.6 years ago by yarmda0

You were already doing that when you gave samtools a bed file. You can still do that. I was just showing an example pipeline to get better formatted results.

ADD REPLYlink written 2.6 years ago by igor8.0k

Parse the big long line, count how many alternate letters are there.

ADD REPLYlink written 2.6 years ago by swbarnes26.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 617 users visited in the last hour