Entering edit mode
3.9 years ago
pythonlover
•
0
I have the following huge file and I would like to compare if the variants in the same gene are comparable.
If the column of gnomGene has a name for the TP53 gene, then the code will check whether COSMGene has the same name for the TP53 gene. Two variant columns (gnomAD and COSMIC) should be compared if those columns have the same gene name. Same and different one can be written into different columns.
My original file
**gnomAD gnomGene COSMGene COSMIC Variant_ID1 Variant_ID1**
p.K38Q TP53 TP53 p.K38R rs_NO1 rs_NO6
p.L83I TP53 TP53 p.L83P rs_NO2 rs_NO7
p.D86N MAD2 MAD2 p.D86E rs_NO3 rsNO8
p.Y116N MAD2 MAD2 p.Y116S rs_NO4 rsNO9
p.V117A HARS HARS p.V117G rs_NO5 rsNO10
Final file
**gnomAD gnomGene COSMGene COSMIC Variant_ID1 Variant_ID1 Same Different**
p.K38Q TP53 TP53 p.K38R rs_NO1 rs_NO6 p.K38Q
p.L83I TP53 TP53 p.L83I rs_NO2 rs_NO7 p.L83I
p.D86N MAD2 MAD2 p.D86E rs_NO3 rsNO8 p.D86N
p.Y116N MAD2 MAD2 p.Y116N rs_NO4 rsNO9 p.Y116N
p.V117A HARS HARS p.V117A rs_NO5 rsNO10 p.V117A
I have tried to write a code but it did not work as I hoped.
results [ ]
with open(in_file, 'r') as var_file:
for line in ar_file:
if var_file["gnomGene"] == var_file["COSMGene"]:
if var_file["gnomAD "] == var_file["COSMIC"]:
results[entry].append(line)
I have started to learn python but still could not figure out to get it done. Any help is highly appreciated.
I don't understand your problem but you're looking at awk. Something like.
awk -F '\t' '{printf("%s\t%s\n",$0,($2==$3 && $1==$4?"TRUE":"FALSE"));}' input.tsv
Dear Pierre,
Thanks for the feedback. Let me take my explanations further. I have a file with 6 columns with various inputs (gene names, variants, and variant IDs). I want to compare the columns with variants and find similarities / differences. First, I want to find the same genes from columns and then compare variants within the same gene. I 'm new to Python and I am confused when I have tried to compare each columns. Thanks so much for your kind help. I don't actually have a linux system, I wish I would be able to follow your suggestion.
Based on corrected input data:
or
Assumption is that OP wants to match 2 and 3 columns, if they are identical (same), then put identical variants (from gnomAD column) in match column and non-identical variants (from gnomAD column) in nomatch column.
When columns 2 and 3 do not match and you do not want to match on variants, please use following code:
@OP: Your input and output both are incorrect. Please correct them.
Thank you very much for your help. Sorry I am new to the system and it was my first time to add an input. I have corrected the input
No problem. I was there too. Please close the thread if it addressed the issue.