I am really new to CAGE-seq and data published in FANTOM5. http://fantom.gsc.riken.jp/5/data/ Basically, my main goal is to find promoter region of a transcript from a gene. I downloaded the bam file from FANTOM (randomly) and extract the fastq from it using bedtools. Now, I want to learn from scratch how to call peak from the CAGE data. My questions are:
Can you suggest the best workflow for peak calling? I think I will use bwa to aligned to HG38. From the bam generated, I am thinking to use peak calling software. Is this correct?
How do I interpret the peak result? Can it be interpreted as promoter region or only TSS? Because after reading CAGE seq explanation and check the CAGE peak, I think the peak represent TSS rather than promoter region. Is this the case?
I know I can re-calculate using some methods to get promoter region from TSS, but can I just use Ensembl database to get TSS for each transcript rather than manually extract peak from CAGE?
Is there any difference for TSS/promoter region per cell line? What cause the difference?
Thank you very much.