Question

How to View Peaks from ChIP-seq Data generated using old reference genome (hg18)

0

Entering edit mode

14 months ago

DareDevil ★ 4.5k

I downloaded chiseq data from below link GSE25769. How do I use this data for viewing the peaks. The data analysis was performed using old reference genome hg18. What are the values in the column 2, 5, 7 represents?

Below is the information by running zcat GSM632892_HCT405_realign.txt.gz | head

#RUN_TIME Wed Oct  8 13:51:55 2008
#SOFTWARE_VERSION @(#) $Id: qualityFilter.pl,v 1.8 2007/11/26 14:42:26 tc Exp $
#FILTER_CRITERION ((CHASTITY>=0.6))
GGAATGGAATGGAATGGAATGGAACAACCCGAATGG 15906 1 ref_chr4:49347516 F GGAATGGAATGGAATGGAATGGATCAACCCGAGTGG 15906
GAACTTGATTTAAAATAATGTTGTATGTAGTATTTA 18000 1 ref_chr4:133533796 F GAACTTGATTTAAAATAATGTTGTATGTAGTATTTA 14859
GGACTAAGAATTGGGAGTACCCAGGACATCCAATTA 18000 8
GTCTTAGGCACAGTAATCAAGGAACCTAAGACCGAG 18000 1 ref_chr1:84351102 F GTCTTAGGCACAGTAATCAAGGAACCTAAGACCGAG 14859
GCAAAGACAAAAATCTTTCTAAGATTGGCCAAAATG 18000 1 ref_chr4:23417003 F GCAAAGACAAAAATCTTTCTAAGATTGGCCAAAATG 14859
GAAGTGCAGTGGTGGGATCTTGGCTCACTGCAAACT 18000 9
GGAAGGAGAGAAGAGATTGTAATAGAAATTAACAAT 18000 1 ref_chr17:64738595 R ATTGTTAATTTCTATTACAATCTCTTCTCTCCTTCC 14859

Also posted on SE Bioinformatics.

R Peak ChIP-seq • 1.4k views

ADD COMMENT • link updated 22 hours ago by Kevin Blighe 89k • written 14 months ago by DareDevil ★ 4.5k

1

Entering edit mode

Honestly, download the fastq files and process yourself. Nothing gained by using legacy genomes and formats that are not standard today.

ADD REPLY • link 14 months ago by ATpoint 90k

0

Entering edit mode

the raw fastq files for this data is not available

ADD REPLY • link 14 months ago by DareDevil ★ 4.5k

1

Entering edit mode

Then intrinsically the entire analysis and conclusions are not reproducible. I personally would never touch such a dataset.

ADD REPLY • link 14 months ago by ATpoint 90k

0

Entering edit mode

I agree with you

ADD REPLY • link 1 day ago by Kevin Blighe 89k

0

Entering edit mode

That having, said by years of experience with ChIP-seq I can confidently say that it is a noisy assay that requires appropriate controls and replication. Without raw data, and with just these tables at hand, unclear how they were created, I think you are just building on uncertainty. Not worth it imo. Rather check whether there are other datasets available.

ADD REPLY • link 14 months ago by ATpoint 90k

score 0 · Answer 1 · 2025-11-07

Hi DareDevil, let's try to close out this question for those others who land here via Google searches,

The file you downloaded from GSE25769 (GSM632892) is an old, deprecated alignment output from Illumina's GERALD/CASAVA pipeline, containing space- or tab-separated read alignments after quality filtering (chastity >=0.6). It lists raw reads with their best mappings, but only includes position details for uniquely mapped reads (where column 3 equals 1). Columns 2, 5, and 7 specifically represent the following: column 2 is the best alignment score for the read; column 5 is the strand orientation (F for forward, R for reverse); and column 7 is the score of the next-best alignment (present only for unique mappings).

This file holds aligned reads rather than called peaks, so you cannot view peaks directly from it. To visualise enrichment (peaks), first convert the unique mappings to a standard format like BED by extracting chromosome and position from column 4, adjusting start/end coordinates based on read length (typically 36 bp for this era, so end = position + 35), and including the strand from column 5. Then, load the BED into IGV (or UCSC Genome Browser) using the hg18 reference assembly—download hg18 from UCSC if needed. For proper peak calling, reprocess via tools like MACS2 (specifying hg18 and single-end mode) to generate a peak BED or bigWig file for overlay.

Given the dataset's age and lack of raw FASTQ, results may be noisy; consider newer cohesin ChIP-seq sets like GSE117876 for cleaner analysis.

Kevin