How to get the phastcons score for protein coding genes and lncRNAs?
1
0
Entering edit mode
4.1 years ago
newbie ▴ 120

Dear All,

I'm interested in checking the conservation of lncRNAs and protein-coding genes from Gencode and similarly from my data I also have some newly assembled lncRNAs which are not found in gencode. I would like to make a plot like below:

enter image description here

The above image is taken from the paper Recurrently deregulated lncRNAs in hepatocellular carcinoma. Figure 1e

Similar to above image, I also wanted to check the conservation of known lncRNAs, protein coding genes and newly found lncRNAs.

How to calculate the phastcons score for all the genes?

RNA-Seq phastcons genome conservation • 2.3k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode

There is a whole literature for that

http://compgen.cshl.edu/phast/phastCons-HOWTO.html

In the paper you mentioned they have downloaded directly but if your data is specific and customized you would have to do the calculations using the tool which you can download from http://compgen.cshl.edu/phast/downloads.php

ADD REPLY
0
Entering edit mode

For the newly found lncRNAs, yes I will do it myself using phast. But how do I get the phastcons score for gencode protein-coding genes and known lncRNAs?

ADD REPLY
0
Entering edit mode

https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=cons100way

At the end there are links to download phastcons scores. I think thats what they mean in the paper

ADD REPLY
0
Entering edit mode

I found an R package GenomicScores to extract the phastcons scores.

In the tutorial of the GenomicScore I see that information can be exxtracted only by each chromosome location.

But I would like to know, whether there is any way to get the phastcons score for all protein coding genes and known lncRNAs from phastCons100way.UCSC.hg38

library(GenomicRanges)
library(phastCons100way.UCSC.hg38)
gsco <- phastCons100way.UCSC.hg38
gscores(gsco, GRanges(seqnames="chr7", IRanges(start=117232380, width=1)))

GRanges object with 1 range and 1 metadata column:
      seqnames    ranges strand |   default
         <Rle> <IRanges>  <Rle> | <numeric>
  [1]     chr7 117232380      * |       0.5
  -------
  seqinfo: 1 sequence from Genome Reference Consortium GRCh38 genome; no seqlengths

Yes from the above example I'm able to get the phastcons score for the specific chromosome location. But I'm interested in extracting the information for all protein coding genes and known lncRNAs.

ADD REPLY
1
Entering edit mode
4.1 years ago
gayachit ▴ 200

Hi

I found a post similar to this. I guess it'll take some work but the method should give you what you need.

Obtaining phastCons conservation score for every gene in the Human genome

I also found the link for getting PhyloP scores. http://hgdownload.cse.ucsc.edu/goldenPath/hg38/phyloP100way/

ADD COMMENT
0
Entering edit mode

thanq. And I posted may doubt in support bioconductor also. I got the reply. Here it is https://support.bioconductor.org/p/129140/#129223

ADD REPLY

Login before adding your answer.

Traffic: 2056 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6