What Does Thickstart (Col 7) Or Thickend (Col 8) Mean In A Bed File?
2
0
Entering edit mode
8.0 years ago
Jordan ★ 1.2k

Hi,

I downloaded a list of refseq genes from the table browser - UCSC in bed format. From the bed format description given by UCSC, thickStart and thickEnd means:

thickStart - The starting position at which the feature is drawn thickly (for example, the start codon in gene displays).
thickEnd - The ending position at which the feature is drawn thickly (for example, the stop codon in gene displays).


This has gotten me a bit confused. To explain my confusion look at the following sample bed file from UCSC.

chrI    7741935    8394405    NR_070240    0    +    8394405    8394405    0    8    18,12,13,9,11,11,8,18,    0,209004,270977,272247,461655,519425,544710,652452,
chrI    8378298    8390022    NM_001129046    0    -    8378298    8390022    0    8    123,103,110,116,65,69,124,113,    0,832,1401,2025,9723,9836,10481,11611,


So, what do columns 2 (start) and 3 (end) mean? And how are they different from columns 7 (thickStart) and 8 (thickEnd)? They seem be different in most of the cases! I thought col 2 and 3 mean meant the starting and ending positions of the genes. But the definition of thickStart and thickEnd has gotten me confused.

Here is the link to bed file description given by UCSC.

bed • 4.1k views
2
Entering edit mode
8.0 years ago

things are clearer using mysql:

\$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A   -e 'select * from ce10.refGene where name="NR_070240"\G'
*************************** 1. row ***************************
bin: 1
name: NR_070240
chrom: chrI
strand: +
txStart: 7741935
txEnd: 8394405
cdsStart: 8394405
cdsEnd: 8394405
exonCount: 8
exonStarts: 7741935,7950939,8012912,8014182,8203590,8261360,8286645,8394387,
exonEnds: 7741953,7950951,8012925,8014191,8203601,8261371,8286653,8394405,
score: 0
name2: Y43F8B.27
cdsStartStat: unk
cdsEndStat: unk
exonFrames: -1,-1,-1,-1,-1,-1,-1,-1,

0
Entering edit mode

It is surprising that none of the coordinates given from my example are present in output. The genome I have used ce10. Perhaps that's the reason?

0
Entering edit mode

opps, updated for ce10...

0
Entering edit mode

I didn't know about \G, thanks. You still have '-D hg19'. Also good to note that when cdsStart == cdsEnd, it is a non-coding gene.

2
Entering edit mode
8.0 years ago
Ido Tamir 5.2k

Thickstart and thickend are the left and the right boundaries of the coding sequence. Columns 2 and 3 are left and the right boundaries of the transcript. In the UCSC genome browser the CDS is displayed "thicker" than the UTRs.