Hi, I have two data sets containing mutations, one is from 1000 genomes and the other is my data. Below is an example row of each data set:
my_data (1 row and 13 columns):
3BHS2_HUMAN_A10E P26439 10 rs28934880 A E C A probably_damaging alignment neutral 0.328 0.145
1000G (1 row and 8 columns):
20 59132666 . G A 56.74 PASS
AB=0.53;AC=6;AF=0.0102;AN=586;BaseQRankSum=3.459;BaseQRankSumZ=-0.123;DP=1115;Dels=0.00;HRun=1;HaplotypeScore=0.1646;MQ=95.36;MQ0=0;MQRankSum=1.213;MQRankSumZ=0..694;QD=4.19;ReadPosRankSum=1.963;ReadPosRankSumZ=0.349;SB=-0.49;VQSLOD=4.0533;set=ALL119
I'm writing a Python script where I would like for each mutation on each line in mydata match That mutation with the correct mutation (line) in 1000G. All I have is this information above. My question is how could I relate the information in mydata with the information I have from 1000G? What I want is the chromosome position or to know that I'm looking at the same mutation (if it exists) in both files. Is this possible to achieve?
Best,
Sofia
Would help to see the headers for the columns. For example, what is "10" in your column 3? Chromosome?