Entering edit mode
                    6.3 years ago
        lizaveta
        
    
        •
    
    0
    I need to get first  (meaning with the most left coordinates possible) or the last n nucleotides positions (let' say, 20 nts but it can be up to the whole genomic region width) in a genomic ranges object grTarget of length 23 kb:
> grTarget
 GRanges object with 2 ranges and 0 metadata columns:
  seqnames            ranges strand
     <Rle>         <IRanges>  <Rle>
  [1]    chr12 25398324-25406855      *
  [2]    chr12 25408100-25423323      *
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths
I expect to get:
1) the most left 20 nts:
 GRanges object with 1 ranges and 0 metadata columns:
  seqnames            ranges strand
     <Rle>         <IRanges>  <Rle>
  [1]    chr12 25398324-25398344      *
   -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths
1) the most right 20 nts:
 GRanges object with 1 ranges and 0 metadata columns:
  seqnames            ranges strand
     <Rle>         <IRanges>  <Rle>
  [1]    chr12 25423303-25423323      *
   -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths
Is there some function that can do this? I only can think of making a function that will take nucleotides in a loop until the length of the subsetted region reaches the desired length:
MostLeftNucleotides = function(grObject, nts_number) {
  for (i_finish in c(1:length(grObject))) {
    if (nts_number <= sum(width(grObject[1:i_finish]))) {
        break
    }
  }
  if (i_finish > 1) {
    grSubset = grObject[1:(i_finish-1)]
    rest_nts = nts_number - sum(width(grSubset))
    LastInterval = grObject[i_finish]
    end(LastInterval) = start(LastInterval) + rest_nts
    grSubset = union(grSubset, LastInterval)
  } else {
    grSubset = grObject[i_finish]
    end(grSubset) = start(grSubset) + nts_number
  }
  return(grSubset) 
}
Thank you @SMK! your answer, however, is not exactly what I was wondering about. I updated the post with some code I have at the moment, it should make my question clearer.
Alright,
flank+range. ;-)It is already closer :) However, if I need a bigger interval (that spans across the first interval and takes half of the second) - it does not work properly and output ranges goes beyond borders or
grTargetand contains 1 interval ( and it should contain 2).Can you show the problematic data as an example? :-(