I have the following huge file and I would like to compare if the variants in the same gene are comparable.
If the column of gnomGene has a name for the TP53 gene, then the code will check whether COSMGene has the same name for the TP53 gene. Two variant columns (gnomAD and COSMIC) should be compared if those columns have the same gene name. Same and different one can be written into different columns.
My original file
**gnomAD gnomGene COSMGene COSMIC Variant_ID1 Variant_ID1** p.K38Q TP53 TP53 p.K38R rs_NO1 rs_NO6 p.L83I TP53 TP53 p.L83P rs_NO2 rs_NO7 p.D86N MAD2 MAD2 p.D86E rs_NO3 rsNO8 p.Y116N MAD2 MAD2 p.Y116S rs_NO4 rsNO9 p.V117A HARS HARS p.V117G rs_NO5 rsNO10
**gnomAD gnomGene COSMGene COSMIC Variant_ID1 Variant_ID1 Same Different** p.K38Q TP53 TP53 p.K38R rs_NO1 rs_NO6 p.K38Q p.L83I TP53 TP53 p.L83I rs_NO2 rs_NO7 p.L83I p.D86N MAD2 MAD2 p.D86E rs_NO3 rsNO8 p.D86N p.Y116N MAD2 MAD2 p.Y116N rs_NO4 rsNO9 p.Y116N p.V117A HARS HARS p.V117A rs_NO5 rsNO10 p.V117A
I have tried to write a code but it did not work as I hoped.
results [ ] with open(in_file, 'r') as var_file: for line in ar_file: if var_file["gnomGene"] == var_file["COSMGene"]: if var_file["gnomAD "] == var_file["COSMIC"]: results[entry].append(line)
I have started to learn python but still could not figure out to get it done. Any help is highly appreciated.