Question: How to calculate the conservation score of a short RNA sequence?
0
gravatar for Bjoux
3.2 years ago by
Bjoux0
China/Changsha/CSU
Bjoux0 wrote:

I want to calculate the conservation of a segment of RNA sequence (about 50 nt),. I tried phastCons but failed, because I do not know how to prepare the input file. I want to use the download phastcons score of UCSC conservation score file, but I do not have the location of my seq in the genome. Does anyone know some other methods to calculate conservation score?

Thanks

Bjou

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Bjoux0
2
gravatar for Alex Reynolds
3.2 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

Map your sequence to a genomic position (or positions) with BLAT or similar. Then use BEDOPS bedmap to map the positions to the score data in BED files of phastCons or phyloP scores (available from UCSC goldenpath).

ADD COMMENTlink written 3.2 years ago by Alex Reynolds29k

Thanks. I try the approach you mentioned. The procedure is below: 1. dowload the phastCons7way file from UCSC, it contains hg38.phastCons7way.wigFix (17.8GB) which includes the conservation score of each neuleotide. 2.the length of my mrna seq is 1707. the results of mapping this seq to hg38 is: actions query score start end qsize identity chro strand start end span browserdetails YourSeq 1704 1 1707 1707 100.0% 1 - 41847189 42035925 188737 3. use bedmap to map the postion to the phastCons score file. but I do not know how to use bedmap to extract the corresponding score in the phastCons7way.wigFix file.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Bjoux0

Convert the WIG-formatted file to BED format with wig2bed (also in the BEDOPS kit). If BLAT gives you data in PSL format, you can use psl2bed to convert that to BED. Map the regions to the signal, e.g. in the most simple case (which may or may not work well for phastCons signal):

$ bedmap --echo --echo-map-score regions.bed signal.bed > answer.bed

The documentation for bedmap is probably a good place to start, as it explains the tool and various scenarios for its use to map genomic intervals to score data.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Alex Reynolds29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1526 users visited in the last hour