Question

Help with KaryoploteR plotting

0

Entering edit mode

6.2 years ago

neethav ▴ 40

I am trying to follow the tutorial for Karyotyper to plot genomic events (I have gene ID and frequency) on hg38 track. I am following the example for gene expression (https://bernatgel.github.io/karyoploter_tutorial//Examples/GeneExpression/GeneExpression.html). But I am struck at the stage where I join my data to the Granges object as metadata columns. Hg38 doesn't have the gene name, but gene ID (I am not able to understand which ID it is). But the TxDb.Dmelanogaster.UCSC.dm6.ensGene (in the tutorial) has gene names in it. I am not familiar with GRanges, so I am not able to figure out how to add my data to the GRanges object of hg38, to proceed with the plotting. hg38 has transcript IDs.

txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene

hm.genes <- genes(txdb)

hm.genes

GRanges object with 24183 ranges and 1 metadata column:

        seqnames                 ranges strand |     gene_id


           <Rle>              <IRanges>  <Rle> | <character>


      1    chr19   [58345178, 58362751]      - |           1


     10     chr8   [18391245, 18401218]      + |          10


    100    chr20   [44619522, 44651742]      - |         100


   1000    chr18   [27950966, 28177446]      - |        1000

100009613 chr11 [70072434, 70075433] - | 100009613

    ...      ...                    ...    ... .         ...

   9991     chr9 [112217716, 112333667]      - |        9991


   9992    chr21 [ 34364024,  34371389]      + |        9992

   9993    chr22 [ 19036282,  19122454]      - |        9993

   9994     chr6 [ 89829894,  89874436]      + |        9994

   9997    chr22 [ 50523568,  50526439]      - |        9997

txgr <- transcripts(txdb)

txgr

GRanges object with 82960 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name

               <Rle>        <IRanges>  <Rle> | <integer> <character>

  [1]           chr1 [ 11874,  14409]      + |         1  uc001aaa.3

  [2]           chr1 [ 11874,  14409]      + |         2  uc010nxq.1

  [3]           chr1 [ 11874,  14409]      + |         3  uc010nxr.1

  [4]           chr1 [ 69091,  70008]      + |         4  uc001aal.1

  [5]           chr1 [321084, 321115]      + |         5  uc001aaq.2
  ...            ...              ...    ... .       ...         ...

[82956] chrUn_gl000237 [ 1, 2686] - | 82956 uc011mgu.1

[82957] chrUn_gl000241 [20433, 36875] - | 82957 uc011mgv.2

[82958] chrUn_gl000243 [11501, 11530] + | 82958 uc011mgw.1

[82959] chrUn_gl000243 [13608, 13637] + | 82959 uc022brq.1

[82960] chrUn_gl000247 [ 5787, 5816] - | 82960 uc022brr.1

My data:

head(data) Gene Freq ID

1 AATF 1 uc002hni.3

2 ACAD10 1 uc001tsq.4

3 ACADSB 1 uc001lhb.4

4 AGO3 1 uc001bzn.1

5 AJUBA 1 uc001why.5

6 ALOX15B 1 uc002gju.4

My goal is to plot the frequency of the genes on the chromosome. Could you please help with tweaking the code for hg38 to add my data to the GRanges object?

Thank you! Neetha

Visualization bernatgel • 1.9k views

ADD COMMENT • link updated 6.2 years ago by Bastien Hervé 5.4k • written 6.2 years ago by neethav ▴ 40

score 1 · Answer 1 · 2018-04-13

I found a trick :

### Read your frequency file, apply column names and (important) set the tx_name as index. It has to be 3 columns like your 'data' variable
data = read.table("frequency.csv", header = FALSE, sep = "\t", col.names=c("name", "freq", "ucsc"), row.names=3)

### Load your txdb object
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
txgr <- transcripts(txdb)

###Set txgr index with the 'tx_name' column
names(txgr) <- txgr$tx_name

###Merge your dataframe and your GRanges object, thanks to indexes
mcols(txgr) <- df[names(txgr),]

### You can even reset txgr index with the new 'name' column
names(txgr) <- txgr$name