merge file according to IDs
1
0
Entering edit mode
3.5 years ago
qwzhang0601 ▴ 80

I have several files including read counts covering CpG sites, methylated and unmethylated (shown as below). Since I have tens of different samples, which have different CpG sites covered. Is there some command or R function which can merge such files easily? Merge the rows based on ChrID and position, if not covered in one sample just represented by 0.

files I prepreared are in such format

ChrID position #methy  #unMethy 
1       13823   0       1
1       13828   1       0

expected result

ChrID position #methy_sample1  #unMethy_sample1 #methy_sample2  #unMethy_sample2 ....
....

Thanks

sequencing • 562 views
ADD COMMENT
0
Entering edit mode

What have you tried on your own?

ADD REPLY
0
Entering edit mode
3.5 years ago

You first want to load your files into a list.

library("tidyverse")

# Load the data.
# Specify the directory with your tables.
# Change the delimiter to whatever your files use.

files <- list.files("path/to/dir", full.names=TRUE)
files <- set_names(files, basename(files))

example <- imap(files, function(x, y) {
  read_delim(x, delim="\t") %>% rename_with(~str_c(.x, y, sep="_"), !c(ChrID, position))
})

I'll create some example data of what to expect once your files are loaded.

example <- list(table_1.txt = structure(list(ChrID = c(1, 1, 1), position = c(100, 
200, 300), methy_table_1.txt = c(1, 1, 0), unMethy_table_1.txt = c(1, 
0, 1)), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"
)), table_2.txt = structure(list(ChrID = c(1, 1, 1), position = c(100, 
200, 350), methy_table_2.txt = c(1, 0, 0), unMethy_table_2.txt = c(1, 
1, 1)), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"
)))

> example
$table_1.txt
# A tibble: 3 x 4
  ChrID position methy_table_1.txt unMethy_table_1.txt
  <dbl>    <dbl>             <dbl>               <dbl>
1     1      100                 1                   1
2     1      200                 1                   0
3     1      300                 0                   1

$table_2.txt
# A tibble: 3 x 4
  ChrID position methy_table_2.txt unMethy_table_2.txt
  <dbl>    <dbl>             <dbl>               <dbl>
1     1      100                 1                   1
2     1      200                 0                   1
3     1      350                 0                   1

Combining is now a simple matter of reduce and full_join.

merged <- example %>%
  reduce(full_join, by=c("ChrID", "position")) %>%
  mutate(across(where(is.numeric), ~replace_na(.x, 0)))

> merged
# A tibble: 4 x 6
  ChrID position methy_table_1.t… unMethy_table_1… methy_table_2.t…
  <dbl>    <dbl>            <dbl>            <dbl>            <dbl>
1     1      100                1                1                1
2     1      200                1                0                0
3     1      300                0                1                0
4     1      350                0                0                0
# … with 1 more variable: unMethy_table_2.txt <dbl>
ADD COMMENT

Login before adding your answer.

Traffic: 2504 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6