I am trying to help my PI with a project. He gave me a list of 2700 hgnc_ids and wants me to obtain the sequence of 1000 bases upstream of TSS of each of the 2700 genes, and the first exon of each gene, and the sequence of the 1000 bases downstream of the first exon(the intron).
I tried ensembl with BiomaRt with R bioconductor, however, i am only able to obtain the 1000 upstream flanking sequences and the exons, but ensembl does not have introns sequence function.
I also tried BioString and BSgenome but it seems I could only query one gene at a time and it didn't work with all 2700 genes at once.
Does any body know what I could do?