I'm doing an exercise that asks for two files:
- Input 1:
A target file(.bed format) contains multiple regions fromchr7:40000000-50000000of human reference genome GRCh37 (hg19) - Input 2:
Refseq exon list file(.bed format) for all human coding genes (hg19 position)
The final goal is:
For all genes located in chr7:40000000-50000000, get the summary statistics of the target file coverage. (For each gene, get the fraction of exonic bases that was covered by the target file).
I believe what they refer in this exercise is that the target file should be something like the whole chr sizes of hg19 hg19.chr.sizes and refseq_exon_list the list of exons from reqseq database. Both can be downloaded from tools like table browser. Is that correct?
I'm not sure which files I should download here to perform this task. Once downloaded the file I believe what I need to do is to restrict the refseq_exon_list by the requested region and then perform coverage with something like bedtools. Something on those lines:
bedtools coverage -a hg19.chr.sizes -b reqseq_exon_list
Am I right here ? Any input is appreciated, thanks.