I have snps across different isolates that are either heterozygous or homozygous. What I want to do is convert the data to a hom frequency table that (total hom/no. het + hom) say every 10 million bases and produce a heatmap with each isolate a band on the y axis and on the X axis homozygous snp frequency. Any pointers how could do this in R or even python. Seems like would have been done before but my R skills are low. I can work on a tutorial on heatmap2 in R but not sure how to produce the frequency table?
Example data:
1 = yes, 0 = no.
lib het hom snp_position
LIB10000 1 0 917206
LIB10000 1 0 917912
LIB10000 1 0 2703436
LIB10000 1 0 2736063
LIB10000 0 1 3843431
LIB10000 1 0 5195338
LIB10000 1 0 8054844
LIB10000 1 0 8108156
LIB10000 0 1 8685923
LIB10000 0 1 8983713
LIB10000 0 1 8984241
LIB10000 1 0 9391014
LIB10000 1 0 9658660
LIB10000 1 0 12116052
LIB10000 1 0 15798269
LIB10000 0 1 19809883
LIB10000 0 1 25505855
LIB10000 0 1 25541608
LIB10000 1 0 26855440
LIB10000 1 0 27136672
LIB10000 0 1 28417750
LIB10000 1 0 29906291
LIB10000 0 1 41573928
LIB10000 0 1 55496549
LIB10000 1 0 55651887
LIB10000 1 0 59141554
Output example:
lib bp hom (%)
LIB10000 10000000 28.5714285714286
LIB10000 20000000 33.3333333333333
LIB10000 30000000 50.0
LIB10000 40000000 0.0
what does the bp mean in your output example? is it the snp_position? Can you give example input and example output based on that input?
The example output is what would want. The bp is 10,000,000 bp along the chromosome so how many snps are within first 10,000,000 bp = 13, and 4 are hom so 4/13*100 = 30.7 so the first one is a little off byt the other two are correct 33% and 50%. The snp position says where it is on the chromosome so can count up how many in the window. hope this helps.