Site Frequencey Spectrum of Mutations from BAM File
0
1
Entering edit mode
7.0 years ago

I am dealing with breast cancer data from the TCGA Database; and I am trying to get a site frequency spectrum on all of the mutations in the cancerous chromosome-position pairs. For the non-cancerous data I have a VCF file, and for the cancerous data I have a BAM file. So far I have a script that reads the VCF file for non-cancerous data and stores all heterozygous chromosome-position pairs in a dictionary ancestors := {(chrom,pos)} -> (allele) (Which I can easily output into a text file if need be). I am now looking for a way to take the BAM file and get , for every (chromosome, position) in the above dictionary , a list containing the quality of score and allele for every read in the BAM file at the specified chromosome-position pair. To do this I need some command to filter the BAM file.

If I get this working I will then be able to obtain a site frequency spectrum for mutations in the cancerous data. I was trying to use SAMTOOLS and ANGSD but was unable to find the correct command.

tcga • 1.8k views
ADD COMMENT
1
Entering edit mode

Have you considered writing a small program in python using pysam? The mpileup API from that is what you need.

ADD REPLY
0
Entering edit mode

Thanks! I was able to do it with SAMTOOLS. In particular I used the command: mpileup -l pos.BED -b bam.filelist >> results.mpileup The tricky part is that the quality scores are given in ascii code.

ADD REPLY

Login before adding your answer.

Traffic: 1804 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6