I have multiple whitespace delimited files with a different number of columns. I would like to join them and keep the common column names.
test1_file
id SNP1 SNP2 SNP3
162 AA BB AA
172 AB BB AB
test2_file
id SNP1 SNP2 SNP3 SNP4
189 BB BB AA AB
1590 AA AB BB AA
test3_file
id SNP8 SNP2 SNP3 SNP9
182 AA BB AA AA
132 BB BB AA BB
desired output file:
id SNP2 SNP3
162 BB AA
172 BB AB
189 BB AA
1590 AB BB
182 BB AA
132 BB AA
my code: Its not working, and i would like to add more files than just two
import pandas as pd
test1_file = pd.read_csv('2014_07_03', delim_whitespace = True)
test2_file = pd.read_csv('2019_08_19', delim_whitespace = True)
f = pd.merge(left=test1_file, right=test2_file , how='left', on='MergeCol')
f.to_csv('outfile.csv')
Thank you very much for your help. i tested it with the sample data above but it doesn't seem to be working.
Hello,
please add more detail about what "doesn't seem to be working".
Thanks.
fin swimmer
this is what i get:
The warning is just a warning. You can ignore it for now or do what it suggest.
But there must be something strange with your input. I've tested it with the data you gave above and that's the output I get:
Should the input files be a certain format? I get a syntax error.
The columns must be separated by whitespaces and not tabs. That's the only requirement.
When ever you receive any kind of error or something "doesn't work": Please give us the details!
fin swimmer
Thanks.The whitespace delimited fixed it.
I VERY much appreciate your help