How to save and reload a GRanges object with metadata
0
0
Entering edit mode
4.5 years ago

Hi, I'm new to R and working through some existing code. I'm wanting to save a GRanges object to disk and then be able to reload it, preserving the metadata, so the object I restore to memory is the same as the one I saved to disk. The object I have in memory is as follows:

head(gr)

GRanges object with 6 ranges and 6 metadata columns:
                    seqnames      ranges strand |      TXID   TXSTART     TXEND
                       <Rle>   <IRanges>  <Rle> | <integer> <integer> <integer>
  ENST00000456328.2     chr1 11869-14409      + |         1     11869     14409
  ENST00000515242.2     chr1 11872-14412      + |         2     11872     14412
  ENST00000518655.2     chr1 11874-14409      + |         3     11874     14409
  ENST00000450305.2     chr1 12010-13670      + |         4     12010     13670
  ENST00000473358.1     chr1 29554-31097      + |         5     29554     31097
  ENST00000469289.1     chr1 30267-31109      + |         6     30267     31109
                               GENEID            TXNAME   TxWidth
                      <CharacterList>       <character> <integer>
  ENST00000456328.2 ENSG00000223972.4 ENST00000456328.2      1657
  ENST00000515242.2 ENSG00000223972.4 ENST00000515242.2      1653
  ENST00000518655.2 ENSG00000223972.4 ENST00000518655.2      1483
  ENST00000450305.2 ENSG00000223972.4 ENST00000450305.2       632
  ENST00000473358.1 ENSG00000243485.2 ENST00000473358.1       712
  ENST00000469289.1 ENSG00000243485.2 ENST00000469289.1       535
  -------
  seqinfo: 25 sequences (1 circular) from an unspecified genome; no seqlengths

I save it to a txt file as follows:

df = as(gr, "data.frame")
write.table(df, filename, sep = "\t", row.names = TRUE, col.names = TRUE, quote = FALSE)

I then read it back in as follows:

df2 = read.table(filename, header = TRUE, sep = "\t", dec = ".")
gr2 = makeGRangesFromDataFrame(df2, keep.extra.columns=TRUE)
head(gr2)

resulting in:

GRanges object with 6 ranges and 6 metadata columns:
                    seqnames      ranges strand |      TXID   TXSTART     TXEND
                       <Rle>   <IRanges>  <Rle> | <integer> <integer> <integer>
  ENST00000456328.2     chr1 11869-14409      + |         1     11869     14409
  ENST00000515242.2     chr1 11872-14412      + |         2     11872     14412
  ENST00000518655.2     chr1 11874-14409      + |         3     11874     14409
  ENST00000450305.2     chr1 12010-13670      + |         4     12010     13670
  ENST00000473358.1     chr1 29554-31097      + |         5     29554     31097
  ENST00000469289.1     chr1 30267-31109      + |         6     30267     31109
                               GENEID            TXNAME   TxWidth
                             <factor>          <factor> <integer>
  ENST00000456328.2 ENSG00000223972.4 ENST00000456328.2      1657
  ENST00000515242.2 ENSG00000223972.4 ENST00000515242.2      1653
  ENST00000518655.2 ENSG00000223972.4 ENST00000518655.2      1483
  ENST00000450305.2 ENSG00000223972.4 ENST00000450305.2       632
  ENST00000473358.1 ENSG00000243485.2 ENST00000473358.1       712
  ENST00000469289.1 ENSG00000243485.2 ENST00000469289.1       535
  -------
  seqinfo: 25 sequences from an unspecified genome; no seqlengths

GENEID and TXNAME are now listed as 'factor'. Am I saving and loading correctly? If so how do I convert GENEID to 'CharacterList' and TXNAME to 'character' ?

For TXNAME it seems I can use: gr2$TXNAME = as.character(gr2$TXNAME)
But typeof(gr$GENEID) says 'S4' and typeof(gr2$GENEID) says 'integer' so I have no idea how to handle this.

If I have restored the same object to memory I would expect GENEID to be listed as 'CharacterList' and typeof to say 'S4'

Thank you

R • 2.8k views
ADD COMMENT
1
Entering edit mode

See ?saveRDS.

ADD REPLY
0
Entering edit mode

perfect! thank you, that's just what I need

ADD REPLY

Login before adding your answer.

Traffic: 1842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6