Question: promoter position estimation
2
gravatar for user230613
4.2 years ago by
user230613280
Europe
user230613280 wrote:

Hi there,
I know that probably this is a very newbie question but I just want to be sure. I want to extract the variants falling withing the promoter region of a given gene. I've done the variant calling analysis allready. I've a gff3 file with the gene position:

##gff-version   3
chr14   ensembl gene    13644042        13743053        .    -    etc...etc

I've two questions. How far is the promoter from a gene, the average? I was planing to get all the variants ~3000bp upstream the start, what do you think? The other question, if the gene is in the negative strand, 3000bp upstream means that the start position should be 3000bp before?I mean the promoter gfff3 would be like this:

##gff-version   3
chr14   ensembl gene    13641042        13740053        .    -    etc...etc

 

promoter gff3 • 2.6k views
ADD COMMENTlink modified 4.2 years ago by Chirag Nepal2.2k • written 4.2 years ago by user230613280

why don't you look at this other post, with an example on how to get promoters coordinates using the R data packages: A: Whole genome coordinates of promoters/gene regulatory elements . I would use the promoters() function using the default parameters, e.g. 2000 upstream of the TSS and 200 downstream.

ADD REPLYlink written 4.2 years ago by Giovanni M Dall'Olio26k
1
gravatar for Chirag Nepal
4.2 years ago by
Chirag Nepal2.2k
Copenhagen
Chirag Nepal2.2k wrote:
  1. There is no fixed definitions for promoter region, and varies between different studies in the range of 500, 1000, 2000 bases around (both upstream and downstream) TSS. You can definitely try few thresholds.

  2. For genes in negative strand, column5 is the TSS. You can define promoter region as [column5-1000, column5+1000] for -ve strand. Use [column4-1000, column4+1000] as genes in +ve strand.

ADD COMMENTlink modified 9 weeks ago by RamRS25k • written 4.2 years ago by Chirag Nepal2.2k

Thank you Chirag for the answer.

  1. are you sure that promoter regions are also downstream the gene? I'm not sure about that, I guess that they are only upstream on the DNA (towards the 5'region for the + strand).

  2. If I extract the coordinates doing col5-1000 and col5+1000, I would also extract the gene itself, and this is not the point, I want only the ~ promotor region.

ADD REPLYlink modified 9 weeks ago by RamRS25k • written 4.2 years ago by user230613280
1

First, you are not sure that annotated TSS is accurate at the single nucleotide level. Promoter region is around (up/downstream) annotated TSS. As a gene can have multiple TSSs in the promoter region, see CAGE-seq studies in human/mouse/zebrafish, just focusing upstream, you might miss something in some genes. TFs, TATA motif bind in the upstream region, but some known TFs (DPE) motifs bind in the downstream of TSS.

ADD REPLYlink modified 9 weeks ago by RamRS25k • written 4.2 years ago by Chirag Nepal2.2k

I've a gene coordinates, not TSS, so the promoter region will be always upstream the gene start position.

ADD REPLYlink written 4.2 years ago by user230613280
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2177 users visited in the last hour