Question

How Can I Get "Peak Shape"/"Per Base Genome Coverage" Using Simple Bed File Containing Peak Locations ?

0

Entering edit mode

12.1 years ago

Atom Smasher ▴ 20

Hello,

I have a file in the Excel (.xls) format which looks like follows :

Chromosome    Start    End    Summit    Height    ChIP_sequences    IgG_sequences    Fold_change    p-value
1    830821    831068    830941    9    10    0    5.32    0.0130582
1    1300370    1300918    1300784    15    32    3    3.99    0.000663352
1    1638630    1638824    1638786    9    11    1    2.9    0.0432279
1    1645497    1645776    1645634    8    11    1    2.9    0.0432279
1    1704190    1704536    1704401    9    15    1    3.87    0.0112948

Each region is a "peak" found by the peak finding algorithm. I wish to find the per base genome coverage/ or the peak shape of the above regions.

I only have this Excel file available with me. Since I do not have any alignment data available, I was wondering how I can find the per base genome coverage / peak shape ?

Can I use the statistics such as "Height" and "Chip_Sequences" in any way to get the peak shape ?

Thanks.

genome coverage alignment peak-calling • 3.2k views

ADD COMMENT • link updated 12.1 years ago by Sukhi Singh 11k • written 12.1 years ago by Atom Smasher ▴ 20

0

Entering edit mode

Your best bet is to go back to whomever gave you these data and get the raw alignment data. Without that, I think you are stuck.

ADD REPLY • link 12.1 years ago by Sean Davis 26k

score 0 · Answer 1 · 2012-04-11

You probably need a bed file which is a collection of reads per line with chromosome position, starting and ending position and an optional strand info. You can always convert your peak file to bed file using cut -f1,2,3 peaks.xls > file.bed if there is no header else sed -e '1,23d' peaks.xls | cut -f1,2,3 > peaks.bed, whatever the length of header is. But the problem with this bed file is, you have already calculated the fold enrichment against the control or the local noise of sample itself, so the pileup or coverage per base wont work as you dont have read info to get the coverage among them. If you get one in future, you can use bedtools utility to get coverage per base as genomeCoverageBed –i file.bed –g my.genome –d > sample.cov. Height is just the height of peak summit, ChIPsequences is the number of reads in sample and IgGsequences is the number of reads in your mock controls in that specific region and I don't know, how it can be used to get this coverage per base info. Also, I am not sure what do you mean by peak shape, may be you want a wig/bigwig file for your peak visualization is UCSC/IGV browser. There is another good tool called pyicos, where you can convert your bed file to bedpk file and get a peak area for each peak.

Cheers