Question: Summing Chromosome Sizes
0
4 weeks ago by
selplat2120
selplat2120 wrote:
``````File1_Col1 <- c("Chr1", "Chr2", "Chr3", "Chr4", "Chr5")
File1_Col2 <- c(10000, 8000, 5000, 2000, 500)
File1 <- data.frame(File1_Col1, File1_Col2)
File2_Col1 <- c("Chr1", "Chr1", "Chr1", "Chr2", "Chr2", "Chr2","Chr3", "Chr3", "Chr3"
,"Chr4", "Chr4", "Chr4","Chr5", "Chr5", "Chr5")
File2_Col2 <- c(1,5,7,2,3,5,3,4,5,1,3,6,2,4,5)
File2 <- data.frame(File2_Col1, File2_Col2)
``````

I have two files: File 1 contains chromosomes and their sizes and File 2 contains a list of SNPs for each chromosome.

I need to have the SNPs consecutive by position, so for example:

Chr3 SNP 3 should actually be 3+(the size of both preceding chromosomes) = 3+10000+8000= 18003

Can someone help me write a loop in R that will just sum the sizes of preceding chromosomes in File2_Col2?

sequencing R • 105 views
modified 4 weeks ago by rpolicastro2.3k • written 4 weeks ago by selplat2120
1

What have you tried? You should look at calculating `cumsum` for the first data frame followed by a `merge`, then you can create a derived field that is the sum of the the `Col2` field.

Thank you so much that answers my question!

0
4 weeks ago by
rpolicastro2.3k
rpolicastro2.3k wrote:

I'm not completely sure I understand, but here is a tidyverse answer of what I thought you meant, based on @RAmRS's comment.

``````library("tidyverse")

result <- File1 %>%
mutate(cumsum_chr=cumsum(File1_Col2)) %>%
right_join(File2, by=c("File1_Col1"="File2_Col1")) %>%
mutate(newcol=cumsum_chr+File2_Col2) %>%
select(!c(File1_Col2, cumsum_chr))

> result
File1_Col1 File2_Col2 newcol
1        Chr1          1  10001
2        Chr1          5  10005
3        Chr1          7  10007
4        Chr2          2  18002
5        Chr2          3  18003
6        Chr2          5  18005
7        Chr3          3  23003
8        Chr3          4  23004
9        Chr3          5  23005
10       Chr4          1  25001
11       Chr4          3  25003
12       Chr4          6  25006
13       Chr5          2  25502
14       Chr5          4  25504
15       Chr5          5  25505
``````