Question: How to get the phastcons score for protein coding genes and lncRNAs?
0
gravatar for newbie
4 months ago by
newbie70
newbie70 wrote:

Dear All,

I'm interested in checking the conservation of lncRNAs and protein-coding genes from Gencode and similarly from my data I also have some newly assembled lncRNAs which are not found in gencode. I would like to make a plot like below:

enter image description here

The above image is taken from the paper Recurrently deregulated lncRNAs in hepatocellular carcinoma. Figure 1e

Similar to above image, I also wanted to check the conservation of known lncRNAs, protein coding genes and newly found lncRNAs.

How to calculate the phastcons score for all the genes?

ADD COMMENTlink modified 4 months ago by gayachit200 • written 4 months ago by newbie70
1

Is this also by you: https://support.bioconductor.org/p/129140/ ?

ADD REPLYlink written 4 months ago by Kevin Blighe63k

There is a whole literature for that

http://compgen.cshl.edu/phast/phastCons-HOWTO.html

In the paper you mentioned they have downloaded directly but if your data is specific and customized you would have to do the calculations using the tool which you can download from http://compgen.cshl.edu/phast/downloads.php

ADD REPLYlink written 4 months ago by gayachit200

For the newly found lncRNAs, yes I will do it myself using phast. But how do I get the phastcons score for gencode protein-coding genes and known lncRNAs?

ADD REPLYlink written 4 months ago by newbie70

https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=cons100way

At the end there are links to download phastcons scores. I think thats what they mean in the paper

ADD REPLYlink written 4 months ago by gayachit200

I found an R package GenomicScores to extract the phastcons scores.

In the tutorial of the GenomicScore I see that information can be exxtracted only by each chromosome location.

But I would like to know, whether there is any way to get the phastcons score for all protein coding genes and known lncRNAs from phastCons100way.UCSC.hg38

library(GenomicRanges)
library(phastCons100way.UCSC.hg38)
gsco <- phastCons100way.UCSC.hg38
gscores(gsco, GRanges(seqnames="chr7", IRanges(start=117232380, width=1)))

GRanges object with 1 range and 1 metadata column:
      seqnames    ranges strand |   default
         <Rle> <IRanges>  <Rle> | <numeric>
  [1]     chr7 117232380      * |       0.5
  -------
  seqinfo: 1 sequence from Genome Reference Consortium GRCh38 genome; no seqlengths

Yes from the above example I'm able to get the phastcons score for the specific chromosome location. But I'm interested in extracting the information for all protein coding genes and known lncRNAs.

ADD REPLYlink written 4 months ago by newbie70
1
gravatar for gayachit
4 months ago by
gayachit200
India
gayachit200 wrote:

Hi

I found a post similar to this. I guess it'll take some work but the method should give you what you need.

Obtaining phastCons conservation score for every gene in the Human genome

I also found the link for getting PhyloP scores. http://hgdownload.cse.ucsc.edu/goldenPath/hg38/phyloP100way/

ADD COMMENTlink written 4 months ago by gayachit200

thanq. And I posted may doubt in support bioconductor also. I got the reply. Here it is https://support.bioconductor.org/p/129140/#129223

ADD REPLYlink written 4 months ago by newbie70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1499 users visited in the last hour