promoter position estimation
1
2
Entering edit mode
7.3 years ago
user230613 ▴ 350

Hi there,

I know that probably this is a very newbie question but I just want to be sure. I want to extract the variants falling withing the promoter region of a given gene. I've done the variant calling analysis already. I've a gff3 file with the gene position:

##gff-version   3
chr14   ensembl gene    13644042        13743053        .    -    etc...etc


I've two questions. How far is the promoter from a gene, the average? I was planing to get all the variants ~3000bp upstream the start, what do you think? The other question, if the gene is in the negative strand, 3000bp upstream means that the start position should be 3000bp before?I mean the promoter gfff3 would be like this:

##gff-version   3
chr14   ensembl gene    13641042        13740053        .    -    etc...etc

promoter gff3 • 4.1k views
0
Entering edit mode

why don't you look at this other post, with an example on how to get promoters coordinates using the R data packages: A: Whole genome coordinates of promoters/gene regulatory elements . I would use the promoters() function using the default parameters, e.g. 2000 upstream of the TSS and 200 downstream.

1
Entering edit mode
7.3 years ago
Chirag Nepal ★ 2.3k
1. There is no fixed definitions for promoter region, and varies between different studies in the range of 500, 1000, 2000 bases around (both upstream and downstream) TSS. You can definitely try few thresholds.

2. For genes in negative strand, column5 is the TSS. You can define promoter region as [column5-1000, column5+1000] for -ve strand. Use [column4-1000, column4+1000] as genes in +ve strand.

0
Entering edit mode

Thank you Chirag for the answer.

1. are you sure that promoter regions are also downstream the gene? I'm not sure about that, I guess that they are only upstream on the DNA (towards the 5'region for the + strand).

2. If I extract the coordinates doing col5-1000 and col5+1000, I would also extract the gene itself, and this is not the point, I want only the ~ promotor region.

1
Entering edit mode

First, you are not sure that annotated TSS is accurate at the single nucleotide level. Promoter region is around (up/downstream) annotated TSS. As a gene can have multiple TSSs in the promoter region, see CAGE-seq studies in human/mouse/zebrafish, just focusing upstream, you might miss something in some genes. TFs, TATA motif bind in the upstream region, but some known TFs (DPE) motifs bind in the downstream of TSS.

0
Entering edit mode

I've a gene coordinates, not TSS, so the promoter region will be always upstream the gene start position.