Is there a way to remove certain chromosomes from a GRanges Object?
3 months ago
cthangav ▴ 10

Hello, I am trying to do the tutorial here:

But when I try to parse the seurat object using:

motif.matrix <- CreateMotifMatrix(
features = peaks.granges,
pwm = pfm,
genome = 'hg38',
use.counts = FALSE
) %>% as.matrix

I get the following error message:

Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'as.matrix': sequence GL000194.1 not found.

This is probably because GL000194.1 is an unlocalized sequence I should try to remove. I'm new to using R and granges objects, but the object looks like this:

> peaks.granges

GRanges object with 91822 ranges and 1 metadata column:
        seqnames          ranges strand |                   peak
           <Rle>       <IRanges>  <Rle> |            <character>
  [1] GL000194.1   101218-101619      * | GL000194.1:101218-10..
  [2] GL000194.1     28230-28519      * | GL000194.1:28230-28519
  [3] GL000194.1     56140-56191      * | GL000194.1:56140-56191
  [4] GL000195.1     24252-24277      * | GL000195.1:24252-24277
  [5] GL000195.1     30140-33355      * | GL000195.1:30140-33355
      ...        ...             ...    ... .                    ...

  [91818]       chrX 9981210-9982172      * |   chrX:9981210-9982172
  [91819]       chrX 9983431-9983799      * |   chrX:9983431-9983799
  [91820]       chrX 9986525-9987246      * |   chrX:9986525-9987246
  [91821]       chrX 9995697-9996334      * |   chrX:9995697-9996334
  [91822]       chrX 9997556-9997868      * |   chrX:9997556-9997868
 seqinfo: 31 sequences from an unspecified genome; no seqlengths

The chromosomes(seqnames) are

> peaks.granges@seqinfo@seqnames
[1] "chr1"       "chr2"       "chr3"       "chr4"       "chr5"       "chr6"       "chr7"       "chr8"       "chr9"      
[10] "chr10"      "chr11"      "chr12"      "chr13"      "chr14"      "chr15"      "chr16"      "chr17"      "chr18"     
[19] "chr19"      "chr20"      "chr21"      "chr22"      "chrX"       "GL000194.1" "GL000195.1" "GL000205.2" "GL000219.1"
[28] "KI270713.1" "KI270726.1" "KI270727.1" "KI270734.1"

If I want to remove the unlocalized sequences (the ones that don't start with chr), whats the best way? Should I do:

peaks.granges <- window(peaks.granges, start=A,end=B) 

where A is the first vector in chr1 and B is the last vector in chrX?

Or is there a quick way to select vectors based on their chromosome (aka seqname)? I was trying to find an example of this in the granges objects manual but I didn't find an example for that in particular.

try dropSeqlevels or keepStandardChromosomes from GenomicRanges library.


