Entering edit mode
3.2 years ago
jeni ▴ 90
I have a dataframe with some genomic intervals and its corresponding coverage in several samples:
sample1 sample2 sample3 1:1-3 30 NA NA 1:1-4 NA 40 35 1:4-5 35 NA NA 1:5-7 NA 50 50 1:6-7 60 NA NA
I would like to obtain the same dataframe but for genomic positions:
sample1 sample2 sample3 1:1 30 40 35 1:2 30 40 35 1:3 30 40 35 1:4 35 40 35 1:5 35 50 50 1:6 60 50 50 1:7 60 50 50
How could I get this?
The intervals can be obtained first by
rownames. Then use
strsplitto get the chromosome (first element) and the ranges (2nd and 3rd element). You can either put this into a data frame and use then
GRangesdirectly to construct a GRanges object. The coverages could be stored as
elementMetadatain the resulting GRanges object. I suggest you try that out. It is a good practice to improve yourself.
Okay, thanks! I have already done that.
But now how can I get genomic positions from each interval, indicating the coverage value of each sample for each position?
Can you show what you have done?
Sure! I've transformed my dataframe in a GRanges object (I've splitted first genomic coordinates to this format -> chr start end):
Now, I have tried:
and I get this:
In this example I cannot obtain all the positions, but in my real df I can, because I have a lot of overlapped intervals. Now the problem I have is that I dont know how to maintain and adapt metadata columns, what I would like is to obtain this:
Related SO post: