I have a set of 38 txt files that all have a similar format: the first column is gene ID and the remaining columns are expression data. I want to join all of these files into one and retain all columns if and only if the first column of gene IDs are the same.
I have tried merge in Pandas, but get a memory error when I try (it does work with other data files though):
df_list =  all_files = glob.glob("*meanCenter_results.txt") for file in all_files: df_list.append(pd.read_csv(file, header = 0, sep = "\t", index_col = 0)) big_df = reduce(lambda left, right: pd.merge(left, right, on = "ORF_Gene", how = "outer"), df_list) big_df.to_csv("All_GEO_Expression_Data_MeanCentered_Combined.txt", header = True, index = True, sep = "\t")
I found this code online and it seems to do what I want, but I'm really new to this particular kind of programming. For this I remanamed one file a.txt and the rest b1.txt to b37.txt:
temp=$(cat a.txt);for i in b*; do temp=$(echo $temp | join -j1 - $i); done; echo $temp
but this just writes it to the terminal window and it's too much to follow.
Can you suggest a way to get a single file, containing all the columns of data with the first column being the shared gene ID?