I am really new to CAGE-seq and data published in FANTOM5. Basically, my main goal is to find promoter region of a transcript from a gene. I downloaded the bam file from FANTOM (randomly) and extract the fastq from it using bedtools. Now, I want to learn from scratch how to call peak from the CAGE data. My questions are:

  1. Can you suggest the best workflow for peak calling? I think I will use bwa to aligned to HG38. From the bam generated, I am thinking to use peak calling software. Is this correct?

  2. How do I interpret the peak result? Can it be interpreted as promoter region or only TSS? Because after reading CAGE seq explanation and check the CAGE peak, I think the peak represent TSS rather than promoter region. Is this the case?

  3. I know I can re-calculate using some methods to get promoter region from TSS, but can I just use Ensembl database to get TSS for each transcript rather than manually extract peak from CAGE?

  4. Is there any difference for TSS/promoter region per cell line? What cause the difference?

Thank you very much.

