Question: Site Frequencey Spectrum of Mutations from BAM File
gravatar for ampearson1729
2.9 years ago by
ampearson17290 wrote:

I am dealing with breast cancer data from the TCGA Database; and I am trying to get a site frequency spectrum on all of the mutations in the cancerous chromosome-position pairs. For the non-cancerous data I have a VCF file, and for the cancerous data I have a BAM file. So far I have a script that reads the VCF file for non-cancerous data and stores all heterozygous chromosome-position pairs in a dictionary ancestors := {(chrom,pos)} -> (allele) (Which I can easily output into a text file if need be). I am now looking for a way to take the BAM file and get , for every (chromosome, position) in the above dictionary , a list containing the quality of score and allele for every read in the BAM file at the specified chromosome-position pair. To do this I need some command to filter the BAM file.

If I get this working I will then be able to obtain a site frequency spectrum for mutations in the cancerous data. I was trying to use SAMTOOLS and ANGSD but was unable to find the correct command.

tcga • 1.1k views
ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by ampearson17290

Have you considered writing a small program in python using pysam? The mpileup API from that is what you need.

ADD REPLYlink written 2.9 years ago by Devon Ryan94k

Thanks! I was able to do it with SAMTOOLS. In particular I used the command: mpileup -l pos.BED -b bam.filelist >> results.mpileup The tricky part is that the quality scores are given in ascii code.

ADD REPLYlink written 2.8 years ago by ampearson17290
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1815 users visited in the last hour