Snp Positions Along A Read
Entering edit mode
11.2 years ago
Uli ▴ 30

Hej all, I asked this question before at stackoverflow but it was suggested to ask it here again: I have a vcf file that contains several SNPs and now I want to see, whether these SNPs are evenly distributed over the reads of the bam file from which I got the SNPs. Specifically, I want to plot the number of SNPs over read position, which means that on the x-axis I have all the bases of a read (in the case of Illumina HiSeq 100 bases) and on the y-axis the cumulative number of SNPs that were found on that position. An example is shown in this paper, figure 2: I am wondering whether there is some tool around for doing this or whether I have to write a script on my own. If so, is there a package in R with which I can do that (I am used to R but don't have much experience with perl)?

bam vcf r • 2.5k views
Entering edit mode
11.2 years ago
Bioinfosm ▴ 620

This sounds complicated. Do you have the accompanying .bam file for these variants? Perhaps you need to use the VCF file and identify all reads from the .bam that actually align in those regions (bedtools would be helpful here). Then parse the .bam file to obtain the mis-matches (CIGAR string) and identify which of those are variants and then eventually plot them!


Login before adding your answer.

Traffic: 2773 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6