Question: How do I get ranges for first or last n nucleotides in genomicranges
0
gravatar for lizaveta
10 months ago by
lizaveta0
lizaveta0 wrote:

I need to get first (meaning with the most left coordinates possible) or the last n nucleotides positions (let' say, 20 nts but it can be up to the whole genomic region width) in a genomic ranges object grTarget of length 23 kb:

> grTarget

 GRanges object with 2 ranges and 0 metadata columns:
  seqnames            ranges strand
     <Rle>         <IRanges>  <Rle>
  [1]    chr12 25398324-25406855      *
  [2]    chr12 25408100-25423323      *
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

I expect to get:

1) the most left 20 nts:

 GRanges object with 1 ranges and 0 metadata columns:
  seqnames            ranges strand
     <Rle>         <IRanges>  <Rle>
  [1]    chr12 25398324-25398344      *
   -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

1) the most right 20 nts:

 GRanges object with 1 ranges and 0 metadata columns:
  seqnames            ranges strand
     <Rle>         <IRanges>  <Rle>
  [1]    chr12 25423303-25423323      *
   -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

Is there some function that can do this? I only can think of making a function that will take nucleotides in a loop until the length of the subsetted region reaches the desired length:

MostLeftNucleotides = function(grObject, nts_number) {
  for (i_finish in c(1:length(grObject))) {
    if (nts_number <= sum(width(grObject[1:i_finish]))) {
        break
    }
  }
  if (i_finish > 1) {
    grSubset = grObject[1:(i_finish-1)]
    rest_nts = nts_number - sum(width(grSubset))
    LastInterval = grObject[i_finish]
    end(LastInterval) = start(LastInterval) + rest_nts

    grSubset = union(grSubset, LastInterval)
  } else {
    grSubset = grObject[i_finish]
    end(grSubset) = start(grSubset) + nts_number
  }
  return(grSubset) 
}
subsetting genomic ranges • 326 views
ADD COMMENTlink modified 10 months ago • written 10 months ago by lizaveta0
0
gravatar for SMK
10 months ago by
SMK1.9k
SMK1.9k wrote:

It can be:

grTarget <-
  GRanges(seqnames = "chr12", ranges = IRanges(
    start = c(25398324, 25406855),
    end = c(25406855, 25423323)
  ))

grTarget_L20nt <-
  GRanges(
    seqnames = grTarget@seqnames,
    ranges = IRanges(start = grTarget@ranges@start, width = 21)
  )

grTarget_R20nt <-
  GRanges(
    seqnames = grTarget@seqnames,
    ranges = IRanges(
      start = grTarget@ranges@start + grTarget@ranges@width - 21,
      width = 21
    )
  )

Which returns:

> grTarget_L20nt
GRanges object with 2 ranges and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]    chr12 25398324-25398344      *
  [2]    chr12 25406855-25406875      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> grTarget_R20nt
GRanges object with 2 ranges and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]    chr12 25406835-25406855      *
  [2]    chr12 25423303-25423323      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
ADD COMMENTlink written 10 months ago by SMK1.9k

Thank you @SMK! your answer, however, is not exactly what I was wondering about. I updated the post with some code I have at the moment, it should make my question clearer.

ADD REPLYlink written 10 months ago by lizaveta0

Alright, flank + range. ;-)

> flank(range(grTarget), width = -21, start = TRUE)
GRanges object with 1 range and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]    chr12 25398324-25398344      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> flank(range(grTarget), width = -21, start = FALSE)
GRanges object with 1 range and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]    chr12 25423303-25423323      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
ADD REPLYlink modified 10 months ago • written 10 months ago by SMK1.9k

It is already closer :) However, if I need a bigger interval (that spans across the first interval and takes half of the second) - it does not work properly and output ranges goes beyond borders or grTarget and contains 1 interval ( and it should contain 2).

ADD REPLYlink written 10 months ago by lizaveta0

Can you show the problematic data as an example? :-(

ADD REPLYlink written 10 months ago by SMK1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2222 users visited in the last hour