Question

how is obtained the BED score values (the 5th column of the bed files) in BED files

0

Entering edit mode

7.3 years ago

fusion.slope ▴ 250

Hello,

am wondering if anyone of you know how is obtained the BED score values (the 5th column of the bed files) of BED files. In UCSC is written:

"score - A score between 0 and 1000. If the track line useScore attribute is set to 1 for this annotation data set, the score value will determine the level of gray in which this feature is displayed (higher numbers = darker gray). This table shows the Genome Browser's translation of BED score values into shades of gray.."

But how is it obtained?

Thanks in advance for any help.

ChIP-Seq BED • 9.7k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 7.3 years ago by fusion.slope ▴ 250

0

Entering edit mode

Typically, the score is usually stored in the fifth column, with the strand in the sixth column. The score value can really be any numerical value related to the associated interval.

ADD REPLY • link 7.3 years ago by Alex Reynolds 35k

0

Entering edit mode

Oh yes the 5th column, thanks for the correction. But i do not see the point of this value if we do not know how it is computed... am I wrong? I need to use it to evaluate the quality of the peaks in my scoring function but i do not know at this point how much it is informative.

ADD REPLY • link 7.3 years ago by fusion.slope ▴ 250

1

Entering edit mode

It depends on the table you are pulling your BED data from. Each table should have an informational or schema page associated with it on the Genome Browser, which describes how the score field is calculated.

ADD REPLY • link 7.3 years ago by Alex Reynolds 35k

1

Entering edit mode

From ENCODE methods for peak calling of these cell lines:

https://www.encodeproject.org/experiments/ENCSR000AKB/

file "Description excerpt: Track description for UCSC Genome Browser composite track…"

SCORE: "Regions of statistically significant signal enrichment. The score associated with each enriched interval is the mean signal value across the interval. (Note that a broad region with moderate enrichment may deviate from the background more significantly than a short region with high signal.)"

is this score the one of the bed file? Am not sure..

ADD REPLY • link 7.3 years ago by fusion.slope ▴ 250

0

Entering edit mode

Am working with the CTCF files of ENCODE, for the dataset GM12878 (lymphoblastoid cell lines). I tried to search how the table was obtained but i did not find anything. If you have any idea about encode ChIP-seq dataset this will help me a lot. Thanks in advance Alex.

ADD REPLY • link 7.3 years ago by fusion.slope ▴ 250

0

Entering edit mode

I have edited 6th column to 5th column after Alex notification.

ADD REPLY • link 7.3 years ago by fusion.slope ▴ 250

score 1 · Answer 1 · 2017-01-24

1

Entering edit mode

7.3 years ago

Devon Ryan 104k

The score can mean anything and be computed in any way. For ChIPseq it's usually some sort of peak score that's computed in a peak caller-specific manner.

ADD COMMENT • link 7.3 years ago by Devon Ryan 104k

0

Entering edit mode

So there is not any technical foundation for this score? So why is it used?

ADD REPLY • link 7.3 years ago by fusion.slope ▴ 250

1

Entering edit mode

There's no predefined definition of the score, but the score in a given file from a given source will have a definition.

ADD REPLY • link 7.3 years ago by Devon Ryan 104k

0

Entering edit mode

Hi, this is not clear to me. If the file is a bam file. Is the score the number of reads in the interval?

ADD REPLY • link 3.3 years ago by shinken123 ▴ 150

0

Entering edit mode

There's no possible answer to this, it can be anything.

ADD REPLY • link 3.3 years ago by Devon Ryan 104k

score 0 · Answer 2 · 2017-09-19

I found this on the TopHat manual: in the output section junctions.bed. A UCSC BED track of junctions reported by TopHat. Each junction consists of two connected BED blocks, where each block is as long as the maximal overhang of any read spanning the junction. The score is the number of alignments spanning the junction. I understand that if you use Tophat, the score you see on each junction is the number of reads that spanning the junction.