I am trying to read two .bed files in R, create matrix format [genes x cells] then aggregate mat1.counts and mat2.counts into one file. The column names (cells) are the exact same between the two files, but the column row names (genes) are not the same, although there are some common genes between the two files. There are also some duplicate genes in both two files. What I have tried is:
mat1 <- read.table(file = "file1.bed", sep = "\t", as.is = c(4,7), header = FALSE) mat2 <- read.table(file = "file2.bed", sep = "\t", as.is = c(4,7), header = FALSE) atac <- read.table('chromatin_counts.tsv', sep = '\t', header = TRUE, as.is = TRUE) barcodes <- colnames(atac) library(rliger) mat1.counts <- makeFeatureMatrix(data1, barcodes) mat2.counts <- makeFeatureMatrix(data2, barcodes) mat1.counts <- mat1.counts[order(rownames(mat1.counts)),] mat2.counts <- mat2.counts[order(rownames(mat2.counts)),] # final_mat = mat1.counts + mat2.counts # commented out because of size mismatch error
- Size of mat1.counts: 1,102,170 genes x 1,047 cells
- Size of mat2.counts: 50,170 genes x 1,047 cells
I tried converting them to data frame format to join them in one data frame and sum up repeating genes, but my system crashed due to the large size of the data frame. To get final_mat, is there another workaround to sum up and aggregate two matrices mat1.counts and mat2.counts to get final_mat?
- I want to sum up the duplicated gene values (rows) in each file mat1.counts and mat2.counts, separately.
- I want to join mat1.counts and mat2.counts together and sum up the intersected genes.
I appreciate any recommendations!