Match keys in json file with csv files
1
0
Entering edit mode
17 months ago
Kristina • 0

'm new to programing and I'm currently working on my thesis.

I'm working with multiple csv files and a json file containing genes with amino acid changes involved in antibiotic resistance. The csv files are formatted like this:

Gene_Aminoacids Filename
gyrA_S95T   SRR9851427
tlyA_L11L   SRR9851427
katG_R463L  SRR9851427

In the json file the genes are present as keys, and the corresponding antibiotic which it effects are set as values.

Ex small part of json file.

"gyrA_A74S" : ["Quinolones"],
"gyrA_D89X" : ["Quinolones"],
"tlyA_C-83T" : ["Capreomycin"]
"katG_R104Q" : ["Isoniazid"],
"katG_S315I" : ["Isoniazid"],
"katG_S315N" : ["Isoniazid"],
  etc.... 

What I'm interested in is finding matching (keys) genes from the json file and the csv files. I'm interested in a new output that should contain the keys that are found in both json & csv file, which is the genes, and the corresponding antibiotic (value) .

Ex of the wanted output

 Gene_Aminoacids Antibiotic  Filename` 
 "katG_R104Q" : ["Isoniazid"], SRR9851427

So far this is the code that I have written and I have looked into similar issues but they didn't work on my data.

def retrive_rest_mutations(jsonfile): 
 with open(jsonfile) as data_file:
      data = json.load(data_file)    
return(data.keys())

mutation_keys = retrive_rest_mutations("tb_TEST.json")

##Read & set path to folder containing a.a changes 

path = "Replaced_P_G.ann.vcf"
samp = glob.glob(path + "/*_G.P.vcf_replaced.txt")

 ###Read text files
result = []

def read_text_file(file_path):
    with open(file_path, 'r') as f:
         print(f.read())

##iterate through all files
   def all_files():
      for file in os.listdir():
       if file.endswith(".txt"):
          file_path = f"{samp}/{file}"
          read_text_file(file_path)
   print("\n")

The code might be wrongly indented due to that i copied it I'm uncertain on how to do the matching between the json file and the multiple csv files and there might be a simple solution to my issue.

Dose anyone maybe have a suggestion, or what I should look into to get the new output containing the Genes + Antibiotic + Filename?

Best regards

Match keys • 429 views
ADD COMMENT
0
Entering edit mode
17 months ago
Shred ★ 1.4k

Based on what you've asked, this might work.

import glob
import json

jsonfile = json.load('your-json-file')
files = glob.glob('*.csv')
for n in files:
    with open(n,'r') as iput:
        for line in iput:
            gene,filename = line.split('\t')
            # use a try to handle KeyError
            try:
                antibiotic = jsonfile[gene]
                # found, now print
                print(f"{gene}:{antibiotic},{filename}")
            except KeyError:
                continue
ADD COMMENT

Login before adding your answer.

Traffic: 2558 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6