How to View Peaks from ChIP-seq Data generated using old reference genome (hg18)
1
0
Entering edit mode
14 months ago
DareDevil ★ 4.5k

I downloaded chiseq data from below link GSE25769. How do I use this data for viewing the peaks. The data analysis was performed using old reference genome hg18. What are the values in the column 2, 5, 7 represents?

Below is the information by running zcat GSM632892_HCT405_realign.txt.gz | head

#RUN_TIME Wed Oct  8 13:51:55 2008
#SOFTWARE_VERSION @(#) $Id: qualityFilter.pl,v 1.8 2007/11/26 14:42:26 tc Exp $
#FILTER_CRITERION ((CHASTITY>=0.6))
GGAATGGAATGGAATGGAATGGAACAACCCGAATGG 15906 1 ref_chr4:49347516 F GGAATGGAATGGAATGGAATGGATCAACCCGAGTGG 15906
GAACTTGATTTAAAATAATGTTGTATGTAGTATTTA 18000 1 ref_chr4:133533796 F GAACTTGATTTAAAATAATGTTGTATGTAGTATTTA 14859
GGACTAAGAATTGGGAGTACCCAGGACATCCAATTA 18000 8
GTCTTAGGCACAGTAATCAAGGAACCTAAGACCGAG 18000 1 ref_chr1:84351102 F GTCTTAGGCACAGTAATCAAGGAACCTAAGACCGAG 14859
GCAAAGACAAAAATCTTTCTAAGATTGGCCAAAATG 18000 1 ref_chr4:23417003 F GCAAAGACAAAAATCTTTCTAAGATTGGCCAAAATG 14859
GAAGTGCAGTGGTGGGATCTTGGCTCACTGCAAACT 18000 9
GGAAGGAGAGAAGAGATTGTAATAGAAATTAACAAT 18000 1 ref_chr17:64738595 R ATTGTTAATTTCTATTACAATCTCTTCTCTCCTTCC 14859

Also posted on SE Bioinformatics.

R Peak ChIP-seq • 1.4k views
ADD COMMENT
1
Entering edit mode

Honestly, download the fastq files and process yourself. Nothing gained by using legacy genomes and formats that are not standard today.

ADD REPLY
0
Entering edit mode

the raw fastq files for this data is not available

ADD REPLY
1
Entering edit mode

Then intrinsically the entire analysis and conclusions are not reproducible. I personally would never touch such a dataset.

ADD REPLY
0
Entering edit mode

I agree with you

ADD REPLY
0
Entering edit mode

That having, said by years of experience with ChIP-seq I can confidently say that it is a noisy assay that requires appropriate controls and replication. Without raw data, and with just these tables at hand, unclear how they were created, I think you are just building on uncertainty. Not worth it imo. Rather check whether there are other datasets available.

ADD REPLY
0
Entering edit mode
22 hours ago

Hi DareDevil, let's try to close out this question for those others who land here via Google searches,

The file you downloaded from GSE25769 (GSM632892) is an old, deprecated alignment output from Illumina's GERALD/CASAVA pipeline, containing space- or tab-separated read alignments after quality filtering (chastity >=0.6). It lists raw reads with their best mappings, but only includes position details for uniquely mapped reads (where column 3 equals 1). Columns 2, 5, and 7 specifically represent the following: column 2 is the best alignment score for the read; column 5 is the strand orientation (F for forward, R for reverse); and column 7 is the score of the next-best alignment (present only for unique mappings).

This file holds aligned reads rather than called peaks, so you cannot view peaks directly from it. To visualise enrichment (peaks), first convert the unique mappings to a standard format like BED by extracting chromosome and position from column 4, adjusting start/end coordinates based on read length (typically 36 bp for this era, so end = position + 35), and including the strand from column 5. Then, load the BED into IGV (or UCSC Genome Browser) using the hg18 reference assembly—download hg18 from UCSC if needed. For proper peak calling, reprocess via tools like MACS2 (specifying hg18 and single-end mode) to generate a peak BED or bigWig file for overlay.

Given the dataset's age and lack of raw FASTQ, results may be noisy; consider newer cohesin ChIP-seq sets like GSE117876 for cleaner analysis.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 3300 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6