I have many files (~50) with the same format of four columns: chr, strand, coordinate, score
Each file has about 1 Million lines. Now I want to merge these files by coordinates (strand specific) while preserve the score value of each sample. So the final file will be: chr, strand, coord, scoreTotal, score1, score2, ...., scoreN
File001: chr strand coord score chr1 + 100 1 chr1 - 200 2 File002: chr1 + 100 3 chr1 + 101 4 chr1 - 200 5 chr1 - 201 6
chr strand coord scoreSum score1 score2 chr1 + 100 4 1 3 chr1 + 101 4 0 4 chr1 - 200 7 2 5 chr1 - 201 6 0 6
I'm thinking of using R package like dplyr to group by coordinates, but for so many files with so many lines, it may not be efficient. I'm also thinking of converting each file to BED format, and then use tool like multiIntersectBed to merge files, but it seems that it can not preserve the score column.
Is there any tool can do this merging quickly?
Thank you very much!