I have a list of several thousand genes for which I would like to get the -1000:1000 promoter regions, using the start of the 5'utr as a mark defining upstream and downstream. I've been using the getsequence command in BiomaRt in R, and can easily retrieve the upstream sequences and 5'utr using this code
ensembl <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
Hs = getSequence(id = human, type="ensembl_gene_id",seqType="coding_gene_flank",upstream = 1000, mart=ensembl)
where "human" is a list of IDs. Additionally I can capture data excluding the 5'utr by using the seqType ="gene_flank". What I can't seem to do is capture the 1000 bp downstream from the start of the 5'utr and into the coding sequence. The program seems to only offer downstream regions that flank the gene. I've tried combinations of downloading coding_gene_flank and exons seqtypes, but the combined length of these sequences will differ from gene to gene as the 5'utr lengths differ.
Moreover, even if I could get the sequences I want, I can't seem to get both upstream and downstream sequences downloaded as a single sequences.
Is there some variation on my current code that would get me -1000:1000 slices that I'm missing? Should I simply move on to another program.