Question

CHIP-Seq data analysis

0

Entering edit mode

8 months ago

prs • 0

I went through a paper on NGS data of M. tuberculosis. In their method section, they mentioned ChIP-Seq analysis and it goes like this "The single-ended sequence reads generated from ChIP-seq experiments were aligned to the reference genome using Bowtie allowing up to 3 mismatches and up to 10 hits per read. Since the samples were sequenced using different protocols resulting in varied read lengths (38–50 nt) all the raw datasets were trimmed to 38 bases to enable unbiased comparison of experiments. Bowtie results were converted into SAM/BAM format using samtools. A custom perl script was then used to obtain the per-base coverage normalized to the total number of mapped reads for each dataset. The script also shifted (by 80 bp) and merged the read counts (RCs) for forward and reverse strands to generate wig files containing single ChIP-seq profiles that were visualized. To compute the RC for each feature, the number of reads mapping to all positions in the feature were summed up and normalized to feature length." Can someone please explain the 80bp shift and how to achieve it using commands. And the possible reason behind it. Thank you.

ChIP-Seq • 521 views

ADD COMMENT • link updated 8 months ago by rfran010 ▴ 900 • written 8 months ago by prs • 0

2

Entering edit mode

It sounds like they use the 80bp to shift the reads more towards the center, probably to give cleaner peaks.

The idea is that since you have single end reads, you only have the sequence from one side of the fragment that was pulled down with the target protein. However, since you are sequencing many fragments, these will essentially randomly be derived from either side and as a results map to the forward or reverse strands. However, the fragment really lies between these reads and so, depending on the ChIP, it may make sense to center the reads over the center of the fragment.

If you only have single-ended data, this is usually just an estimated number. With paired-ended data you could find the middle and align the reads there. Heres a picture of the idea, in red would be single-end reads from, say, the left side of fragments and in blue would be from the other side. Then by shifting the reads to the center you create a more uniform peak.

enter image description here

If you don't know how to do this in perl or another programming language, you could look into deeptools bamCoverage or macs2 to generate bigwigs after shifting reads from a bam file.

Image sourced from random google search for cross-correlation which uses the same idea for a different purpose (image source)

ADD REPLY • link 8 months ago by rfran010 ▴ 900

0

Entering edit mode

Hard to tell what they mean without the whole text (or the actual script). My guess is that maybe they are referring to a Sliding Window Algorithm. You can see implementations of it here. But to calculate coverage there are already well known tools, no need to re-invent the wheel (like some solutions presented in this thread).

ADD REPLY • link 8 months ago by biofalconch ★ 1.1k