I know that probably this is a very newbie question but I just want to be sure. I want to extract the variants falling withing the promoter region of a given gene. I've done the variant calling analysis already. I've a gff3 file with the gene position:
##gff-version 3 chr14 ensembl gene 13644042 13743053 . - etc...etc
I've two questions. How far is the promoter from a gene, the average? I was planing to get all the variants ~3000bp upstream the start, what do you think? The other question, if the gene is in the negative strand, 3000bp upstream means that the start position should be 3000bp before?I mean the promoter gfff3 would be like this:
##gff-version 3 chr14 ensembl gene 13641042 13740053 . - etc...etc
why don't you look at this other post, with an example on how to get promoters coordinates using the R data packages: A: Whole genome coordinates of promoters/gene regulatory elements . I would use the promoters() function using the default parameters, e.g. 2000 upstream of the TSS and 200 downstream.