Question: Calculate gene % covered at depth X
0
gravatar for ellieuk
10 weeks ago by
ellieuk10
ellieuk10 wrote:

Hi,

I'd like to calculate the percentage of given genes covered by an aligned BAM at a given depth i.e. 10x.

Output expected:

Gene_name      percentage_covered_at_10x
Gene1               95%
Gene2               80%
Gene3               99%

I tried: bedtools coverage -abam input.bam -b genes.bed to just give me coverage at any depth which provides me with the fraction of non-zero depth mapped reads for each coding exon (last column). Output below:

chr1    26696403    26697540    ARID1A  0   +   431 1137    1137    1.0000000
chr1    26729650    26729863    ARID1A  1   +   212 213 213 1.0000000
chr1    26731151    26731604    ARID1A  2   +   528 453 453 1.0000000
chr1    26732675    26732792    ARID1A  3   +   70  117 117 1.0000000
chr1    26767789    26767999    ARID1A  10  +   128 210 210 1.0000000
chr1    26771118    26771326    ARID1A  11  +   117 208 208 1.0000000
chr1    26760855    26761096    ARID1A  4   +   152 241 241 1.0000000
chr1    26761383    26761473    ARID1A  5   +   56  90  90  1.0000000
chr1    26762151    26762319    ARID1A  6   +   96  168 168 1.0000000
chr1    26762972    26763285    ARID1A  7   +   253 313 313 1.0000000
chr1    26766220    26766366    ARID1A  8   +   131 146 146 1.0000000
chr1    26766456    26766566    ARID1A  9   +   194 110 110 1.0000000
chr1    26772499    26772632    ARID1A  12  +   187 133 133 1.0000000
chr1    26772811    26772987    ARID1A  13  +   98  176 176 1.0000000
chr1    26773345    26773496    ARID1A  14  +   105 151 151 1.0000000
chr1    26773579    26773717    ARID1A  15  +   242 138 138 1.0000000
chr1    26773801    26773898    ARID1A  16  +   116 97  97  1.0000000
chr1    26774328    26775220    ARID1A  17  +   817 892 892 1.0000000
chr1    26775576    26775707    ARID1A  18  +   92  131 131 1.0000000
chr1    26779022    26780756    ARID1A  19  +   1394    1734    1734    1.0000000
chr3    185204415   185204757   EHHADH  1   -   206 342 342 1.0000000
chr3    185192225   185193487   EHHADH  0   -   1192    1262    1262    1.0000000
chr3    185218135   185218240   EHHADH  2   -   154 105 105 1.0000000
chr3    185229431   185229543   EHHADH  3   -   32  112 112 1.0000000
chr3    185235289   185235462   EHHADH  4   -   92  173 173 1.0000000
chr3    185248413   185248517   EHHADH  5   -   65  104 104 1.0000000
chr3    185253948   185254022   EHHADH  6   -   76  74  74  1.0000000
chr7    66629064    66629208    KCTD7   0   +   85  144 144 1.0000000
chr7    66633274    66633444    KCTD7   1   +   97  170 170 1.0000000
chr7    66638252    66638431    KCTD7   2   +   199 179 179 1.0000000
chr7    66638855    66639232    KCTD7   3   +   268 377 377 1.0000000

However, I want to be a bit smarter and factor in the following:

  1. I'd like the output to be per (coding regions) of the gene as a whole (NB: my input .bed file already has coding exon start and stop positions for each gene as shown by multiple lines)
  2. Only include coverage when depth >10 or any other desired number

If anyone has any ideas, I'd be very grateful! Please bear in mind I'm a clinician not an informatician! E

alignment bedtools genome • 139 views
ADD COMMENTlink modified 10 weeks ago by zx87549.4k • written 10 weeks ago by ellieuk10
  1. Only include coverage when depth >10 or any other desired number

This can be done by piping the output of bedtools into another tool, e.g. awk:

bedtools coverage -abam input.bam -b genes.bed | awk '$NF >= 10 {print $0}'
  • the awk command as it stands above will read in the output of bedtools coverage, check the last column and if the value in the last column is equal to or greater than 10, it will print all fields of that particular line
  • $NF means "last column" (the NFth column ($), where NF = "number of fields" or "total number of 'columns' in your file")
  • $0 means = print all fields

That being said, I'm not sure if the last column is really what you're interested in, but I haven't used the coverage tools in a while, I would just caution you to double-check your assumptions about the results.

ADD REPLYlink written 10 weeks ago by Friederike6.0k

Thanks for the response. That doesn’t work because I need to only count a gene as covered when the read depth is 10x at any position across that gene.

ADD REPLYlink written 10 weeks ago by ellieuk10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1364 users visited in the last hour