Question

Peak Calling on a single gene?

0

Entering edit mode

15 months ago

Maycon • 0

I have a dataset containing some human (cancer cell-line) CHIP-seq runs for different treatment and control groups (both with two replicates) for histone methylation and acetylation marks (8 experiments in total), and I want to perform a peak calling experiment on this data, but I only really care about quantifying the mapping against a single gene (human CXCL1) across the different marks and the treatment and control groups. So I was wondering, does it make sense to map the reads (using BWA) to just the gene sequence before peak-calling, or should I map them at least to the entire chromosome? Do I lose any information by mapping them to the shorter single gene sequence over mapping them to the entire chromosome or the entire genome? Computationally it would be better to map them to the gene sequence instead of an entire genome/chromosome so, if it is OK to do so, I would prefer.

chip peak illumina calling chip-seq • 458 views

ADD COMMENT • link updated 15 months ago by ATpoint 82k • written 15 months ago by Maycon • 0

1

Entering edit mode

Computationally it would be better to map them to the gene sequence instead of an entire genome/chromosome

Using a reduced reference representation is not advisable when the data came from whole genome. Aligners will try their best to align data and thus reads that did not originate from this region will likely be aligned and can potentially mess up your results.

ADD REPLY • link 15 months ago by GenoMax 141k

score 2 · Accepted Answer · 2023-01-26

2

Entering edit mode

15 months ago

ATpoint 82k

Peak callers like macs need a genome-wide distribution of reads iirc to properly build their background models so that pvalues are reliable. I would do standard analysis and then just filter for the region you want post-hoc after peak calling.

ADD COMMENT • link 15 months ago by ATpoint 82k