Question: How to deal with multiple Transcription Start Site in a gene?
0
gravatar for Poorya Parvizi
23 months ago by
Middle East Technical University
Poorya Parvizi20 wrote:

In order to measure the conservation (Phastcons score) of each gene, i am trying to look at -2000:2000 base pairs below and above from transcription start site (TSS), which will represent my promoter. But the problem is, there are more than one TSS for each gene. What should i do? which one should i choose? 

tss phastcons score promoter • 925 views
ADD COMMENTlink modified 23 months ago by Giovanni M Dall'Olio25k • written 23 months ago by Poorya Parvizi20
1
gravatar for Giovanni M Dall'Olio
23 months ago by
London, UK
Giovanni M Dall'Olio25k wrote:

If you want to avoid confusion you can simply use the gene coordinates from the Known Gene UCSC track:

> library(Homo.sapiens)
> library(phastCons100way.UCSC.hg19)
> mypromoters = promoters(genes(TxDb.Hsapiens.UCSC.hg19.knownGene), 2000, -2000)
> mypromoters$cons100way = scores(phastCons100way.UCSC.hg19, mypromoters) # takes a while
> mypromoters
GRanges object with 23056 ranges and 2 metadata columns:
        seqnames                 ranges strand   |     gene_id cons100way
           <Rle>              <IRanges>  <Rle>   | <character>  <numeric>
      1    chr19 [ 58874015,  58876214]      -   |           1 0.10872727
     10     chr8 [ 18246755,  18248954]      +   |          10 0.36422727
    100    chr20 [ 43280177,  43282376]      -   |         100 0.04400000
   1000    chr18 [ 25757246,  25759445]      -   |        1000 0.01645455
  10000     chr1 [244006687, 244008886]      -   |       10000 0.03836364
    ...      ...                    ...    ... ...         ...        ...
   9991     chr9 [115095745, 115097944]      -   |        9991 0.25700000
   9992    chr21 [ 35734323,  35736522]      +   |        9992 0.29409091
   9993    chr22 [ 19109768,  19111967]      -   |        9993 0.01236364
   9994     chr6 [ 90537619,  90539818]      +   |        9994 0.07536364
   9997    chr22 [ 50964706,  50966905]      -   |        9997 0.15281818
  -------
  seqinfo: 93 sequences (1 circular) from hg19 genome

 

ADD COMMENTlink written 23 months ago by Giovanni M Dall'Olio25k

Hi again The code below is not working for me, mypromoters$cons100way = scores(phastCons100way.UCSC.hg19, mypromoters) what should i do? another question is, instead of gene_id i put tx_name column. but can i convert these UCSC ids to ensmble gene ID. actually i convert it by getBM package of R, but it is not seems good.what do you think?

ADD REPLYlink written 21 months ago by Poorya Parvizi20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 945 users visited in the last hour