Simple question - I need to create a GTF file to use in HTSeq-count that contains gene regions plus 3kb upstream. (Background: doing a MeDIP-seq experiment, want to look for differential methylation in genic and 3kb promoter regions using count based method like edgeR/DESeq).
I was planning on making one myself from the UCSC hg19 refFlat table. The refFlat table has gene coordinates, but I need to extend this 3kb upstream to capture promoter regions.
Column 3 contains the strand (+/-) and columns 4-5 contain the transcription start (txStart) and end (txEnd) positions.
If I want to capture 3kb upstream of the TSS, I was planning on adding 3000 to txStart, but is only for genes on the + strand, correct? If I want 3kb upstream of the TSS for genes on the - strand, should I add this 3000 to txEnd?
E.g. the DENND1B gene is on the - strand at chr1:197,473,879-197,744,623. However, looking at it in the browser, it's transcribed "right to left", so I presume I would want to add 3000 to the txEnd number, 197,744,623, even though this is really where transcription starts.
Am I thinking about this correctly?