Question: How do I get ranges for first or last n nucleotides in genomicranges
0
gravatar for lizaveta
6 months ago by
lizaveta0
lizaveta0 wrote:

I need to get first (meaning with the most left coordinates possible) or the last n nucleotides positions (let' say, 20 nts but it can be up to the whole genomic region width) in a genomic ranges object grTarget of length 23 kb:

> grTarget

 GRanges object with 2 ranges and 0 metadata columns:
  seqnames            ranges strand
     <Rle>         <IRanges>  <Rle>
  [1]    chr12 25398324-25406855      *
  [2]    chr12 25408100-25423323      *
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

I expect to get:

1) the most left 20 nts:

 GRanges object with 1 ranges and 0 metadata columns:
  seqnames            ranges strand
     <Rle>         <IRanges>  <Rle>
  [1]    chr12 25398324-25398344      *
   -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

1) the most right 20 nts:

 GRanges object with 1 ranges and 0 metadata columns:
  seqnames            ranges strand
     <Rle>         <IRanges>  <Rle>
  [1]    chr12 25423303-25423323      *
   -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

Is there some function that can do this? I only can think of making a function that will take nucleotides in a loop until the length of the subsetted region reaches the desired length:

MostLeftNucleotides = function(grObject, nts_number) {
  for (i_finish in c(1:length(grObject))) {
    if (nts_number <= sum(width(grObject[1:i_finish]))) {
        break
    }
  }
  if (i_finish > 1) {
    grSubset = grObject[1:(i_finish-1)]
    rest_nts = nts_number - sum(width(grSubset))
    LastInterval = grObject[i_finish]
    end(LastInterval) = start(LastInterval) + rest_nts

    grSubset = union(grSubset, LastInterval)
  } else {
    grSubset = grObject[i_finish]
    end(grSubset) = start(grSubset) + nts_number
  }
  return(grSubset) 
}
subsetting genomic ranges • 232 views
ADD COMMENTlink modified 6 months ago • written 6 months ago by lizaveta0
0
gravatar for SMK
6 months ago by
SMK1.9k
SMK1.9k wrote:

It can be:

grTarget <-
  GRanges(seqnames = "chr12", ranges = IRanges(
    start = c(25398324, 25406855),
    end = c(25406855, 25423323)
  ))

grTarget_L20nt <-
  GRanges(
    seqnames = grTarget@seqnames,
    ranges = IRanges(start = grTarget@ranges@start, width = 21)
  )

grTarget_R20nt <-
  GRanges(
    seqnames = grTarget@seqnames,
    ranges = IRanges(
      start = grTarget@ranges@start + grTarget@ranges@width - 21,
      width = 21
    )
  )

Which returns:

> grTarget_L20nt
GRanges object with 2 ranges and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]    chr12 25398324-25398344      *
  [2]    chr12 25406855-25406875      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> grTarget_R20nt
GRanges object with 2 ranges and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]    chr12 25406835-25406855      *
  [2]    chr12 25423303-25423323      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
ADD COMMENTlink written 6 months ago by SMK1.9k

Thank you @SMK! your answer, however, is not exactly what I was wondering about. I updated the post with some code I have at the moment, it should make my question clearer.

ADD REPLYlink written 6 months ago by lizaveta0

Alright, flank + range. ;-)

> flank(range(grTarget), width = -21, start = TRUE)
GRanges object with 1 range and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]    chr12 25398324-25398344      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> flank(range(grTarget), width = -21, start = FALSE)
GRanges object with 1 range and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]    chr12 25423303-25423323      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
ADD REPLYlink modified 6 months ago • written 6 months ago by SMK1.9k

It is already closer :) However, if I need a bigger interval (that spans across the first interval and takes half of the second) - it does not work properly and output ranges goes beyond borders or grTarget and contains 1 interval ( and it should contain 2).

ADD REPLYlink written 6 months ago by lizaveta0

Can you show the problematic data as an example? :-(

ADD REPLYlink written 6 months ago by SMK1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1220 users visited in the last hour