Question: What Does Thickstart (Col 7) Or Thickend (Col 8) Mean In A Bed File?
0
gravatar for Jordan
7.5 years ago by
Jordan1.2k
Pittsburgh
Jordan1.2k wrote:

Hi,

I downloaded a list of refseq genes from the table browser - UCSC in bed format. From the bed format description given by UCSC, thickStart and thickEnd means:

thickStart - The starting position at which the feature is drawn thickly (for example, the start codon in gene displays).
thickEnd - The ending position at which the feature is drawn thickly (for example, the stop codon in gene displays).

This has gotten me a bit confused. To explain my confusion look at the following sample bed file from UCSC.

chrI    7741935    8394405    NR_070240    0    +    8394405    8394405    0    8    18,12,13,9,11,11,8,18,    0,209004,270977,272247,461655,519425,544710,652452,
chrI    8378298    8390022    NM_001129046    0    -    8378298    8390022    0    8    123,103,110,116,65,69,124,113,    0,832,1401,2025,9723,9836,10481,11611,

So, what do columns 2 (start) and 3 (end) mean? And how are they different from columns 7 (thickStart) and 8 (thickEnd)? They seem be different in most of the cases! I thought col 2 and 3 mean meant the starting and ending positions of the genes. But the definition of thickStart and thickEnd has gotten me confused.

Here is the link to bed file description given by UCSC.

bed • 3.8k views
ADD COMMENTlink modified 7.5 years ago by Ido Tamir5.1k • written 7.5 years ago by Jordan1.2k
2
gravatar for Pierre Lindenbaum
7.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

things are clearer using mysql:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A   -e 'select * from ce10.refGene where name="NR_070240"\G'
*************************** 1. row ***************************
         bin: 1
        name: NR_070240
       chrom: chrI
      strand: +
     txStart: 7741935
       txEnd: 8394405
    cdsStart: 8394405
      cdsEnd: 8394405
   exonCount: 8
  exonStarts: 7741935,7950939,8012912,8014182,8203590,8261360,8286645,8394387,
    exonEnds: 7741953,7950951,8012925,8014191,8203601,8261371,8286653,8394405,
       score: 0
       name2: Y43F8B.27
cdsStartStat: unk
  cdsEndStat: unk
  exonFrames: -1,-1,-1,-1,-1,-1,-1,-1,
ADD COMMENTlink modified 7.5 years ago • written 7.5 years ago by Pierre Lindenbaum131k

It is surprising that none of the coordinates given from my example are present in output. The genome I have used ce10. Perhaps that's the reason?

ADD REPLYlink modified 7.5 years ago • written 7.5 years ago by Jordan1.2k

opps, updated for ce10...

ADD REPLYlink written 7.5 years ago by Pierre Lindenbaum131k

I didn't know about \G, thanks. You still have '-D hg19'. Also good to note that when cdsStart == cdsEnd, it is a non-coding gene.

ADD REPLYlink written 7.5 years ago by brentp23k
2
gravatar for Ido Tamir
7.5 years ago by
Ido Tamir5.1k
Austria
Ido Tamir5.1k wrote:

Thickstart and thickend are the left and the right boundaries of the coding sequence. Columns 2 and 3 are left and the right boundaries of the transcript. In the UCSC genome browser the CDS is displayed "thicker" than the UTRs.

ADD COMMENTlink modified 7.5 years ago • written 7.5 years ago by Ido Tamir5.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1171 users visited in the last hour