Match keys in json file with csv files
6 months ago
Kristina • 0

'm new to programing and I'm currently working on my thesis.

I'm working with multiple csv files and a json file containing genes with amino acid changes involved in antibiotic resistance. The csv files are formatted like this:

Gene_Aminoacids Filename
gyrA_S95T   SRR9851427
tlyA_L11L   SRR9851427
katG_R463L  SRR9851427

In the json file the genes are present as keys, and the corresponding antibiotic which it effects are set as values.

Ex small part of json file.

"gyrA_A74S" : ["Quinolones"],
"gyrA_D89X" : ["Quinolones"],
"tlyA_C-83T" : ["Capreomycin"]
"katG_R104Q" : ["Isoniazid"],
"katG_S315I" : ["Isoniazid"],
"katG_S315N" : ["Isoniazid"],

What I'm interested in is finding matching (keys) genes from the json file and the csv files. I'm interested in a new output that should contain the keys that are found in both json & csv file, which is the genes, and the corresponding antibiotic (value) .

Ex of the wanted output

 Gene_Aminoacids Antibiotic  Filename` 
 "katG_R104Q" : ["Isoniazid"], SRR9851427

So far this is the code that I have written and I have looked into similar issues but they didn't work on my data.

def retrive_rest_mutations(jsonfile): 
 with open(jsonfile) as data_file:
      data = json.load(data_file)    

mutation_keys = retrive_rest_mutations("tb_TEST.json")

##Read & set path to folder containing a.a changes 

path = "Replaced_P_G.ann.vcf"
samp = glob.glob(path + "/*_G.P.vcf_replaced.txt")

 ###Read text files
result = []

def read_text_file(file_path):
    with open(file_path, 'r') as f:

##iterate through all files
   def all_files():
      for file in os.listdir():
       if file.endswith(".txt"):
          file_path = f"{samp}/{file}"

The code might be wrongly indented due to that i copied it I'm uncertain on how to do the matching between the json file and the multiple csv files and there might be a simple solution to my issue.

Dose anyone maybe have a suggestion, or what I should look into to get the new output containing the Genes + Antibiotic + Filename?

Best regards

6 months ago
Shred ★ 1.2k

Based on what you've asked, this might work.

import glob
import json

jsonfile = json.load('your-json-file')
files = glob.glob('*.csv')
for n in files:
    with open(n,'r') as iput:
        for line in iput:
            gene,filename = line.split('\t')
            # use a try to handle KeyError
                antibiotic = jsonfile[gene]
                # found, now print
            except KeyError:

