How Can I Get "Peak Shape"/"Per Base Genome Coverage" Using Simple Bed File Containing Peak Locations ?
1
0
Entering edit mode
12.1 years ago
Atom Smasher ▴ 20

Hello,

I have a file in the Excel (.xls) format which looks like follows :

Chromosome    Start    End    Summit    Height    ChIP_sequences    IgG_sequences    Fold_change    p-value
1    830821    831068    830941    9    10    0    5.32    0.0130582
1    1300370    1300918    1300784    15    32    3    3.99    0.000663352
1    1638630    1638824    1638786    9    11    1    2.9    0.0432279
1    1645497    1645776    1645634    8    11    1    2.9    0.0432279
1    1704190    1704536    1704401    9    15    1    3.87    0.0112948

Each region is a "peak" found by the peak finding algorithm. I wish to find the per base genome coverage/ or the peak shape of the above regions.

I only have this Excel file available with me. Since I do not have any alignment data available, I was wondering how I can find the per base genome coverage / peak shape ?

Can I use the statistics such as "Height" and "Chip_Sequences" in any way to get the peak shape ?

Thanks.

genome coverage alignment peak-calling • 3.2k views
ADD COMMENT
0
Entering edit mode

Your best bet is to go back to whomever gave you these data and get the raw alignment data. Without that, I think you are stuck.

ADD REPLY
0
Entering edit mode
12.1 years ago

You probably need a bed file which is a collection of reads per line with chromosome position, starting and ending position and an optional strand info. You can always convert your peak file to bed file using cut -f1,2,3 peaks.xls > file.bed if there is no header else sed -e '1,23d' peaks.xls | cut -f1,2,3 > peaks.bed, whatever the length of header is. But the problem with this bed file is, you have already calculated the fold enrichment against the control or the local noise of sample itself, so the pileup or coverage per base wont work as you dont have read info to get the coverage among them. If you get one in future, you can use bedtools utility to get coverage per base as genomeCoverageBed –i file.bed –g my.genome –d > sample.cov. Height is just the height of peak summit, ChIPsequences is the number of reads in sample and IgGsequences is the number of reads in your mock controls in that specific region and I don't know, how it can be used to get this coverage per base info. Also, I am not sure what do you mean by peak shape, may be you want a wig/bigwig file for your peak visualization is UCSC/IGV browser. There is another good tool called pyicos, where you can convert your bed file to bedpk file and get a peak area for each peak.

Cheers

ADD COMMENT

Login before adding your answer.

Traffic: 2656 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6