bedtools genomecov issue with calculating coverage across called peaks?
1
0
Entering edit mode
2.1 years ago

Hello everyone,

I feel as though I'm having a conceptual issue when it comes to processing CHiP-seq. I have a list of bed files that I want to convert into bedGraph and then into bigwig for visualization (as recommended in this post: converting Bed to Wig).

I have downloaded bed files from this GEO dataset: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE117864

These peaks were called using HOMER's peakFinder and the data had to be wrangled to fit the proper format

chr1    237654  237866  23.5    +
chr1    714102  714314  51  +
chr1    793377  793589  90.5    +
chr1    805182  805394  211.5   +

Where columns are chrom, chromStart, chromEnd, score and strand respectively. The score used is the peakFinder score normalized to input.

When I go to convert these bed files into bedgraph using genomcov with the following code:

fetchChromSizes hg19 > mychrom.sizes
bedtools genomecov -i input.bed -g mychrom.sizes -bg > output.bedGraph

I get the following coverage calculated:

chr1    237654  237866  1
chr1    714102  714314  1
chr1    793377  793589  1
chr1    805182  805394  1

What exactly happened here, I thought that genomecov was supposed to summarize feature coverage, which would be in this case be the findPeaks score from HOMER (https://bedtools.readthedocs.io/en/latest/content/tools/genomecov.html), however it's seemingly all 1s.

Did I do something incorrect on my behalf? Why did I get this output from genomecov? All I want to do is convert bed into bedgraph and then into bigwig.

Thanks for your time.

genomecov chip-seq bed bedtools • 789 views
ADD COMMENT
0
Entering edit mode

Why do you feel you need to turn the peaks into a wig file? (Big)Wigs are usually generated to have a more efficient way of representing the genome-wide coverage based on millions of reads. Those short-reads are usually stored in BAM files (millions and millions of rows, one per read, i.e. the size of the BAM file will depend on the number of reads in your experiment), which can become unwieldy. To have a more efficient representation, we can chunk up the genome in equal-size bins and just calculated how many reads per bin we have (this is what the bedTools command above would do). This will, in most cases, generate a smaller file, although the number of rows in the final file now depends on the bin size, i.e. if you have 1bp bins, the number of rows will correspond to the number of bp in the genome, if you have 100 bp bins, it will be [genome size/100] and so on.

Have a look at the schema here the illustrates the relationships between the different file types.

ADD REPLY
0
Entering edit mode
2.1 years ago
ATpoint 82k

Think about it. The peak file contains the peaks so it's coordinate which is one interval per peak, hence of course your bedGraph that you build on this file is just that peak with coverage of 1 because as said before, the peak file is one interval per peak. For a coverage track you have to use the alignments, that would typically be thebam files that you into that command.

ADD COMMENT

Login before adding your answer.

Traffic: 2467 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6