Question: How to calculate read depth for a specific gene list?
0
gravatar for Lila M
3 months ago by
Lila M 470
UK
Lila M 470 wrote:

Hi everybody, I'm stuck in the middle of something. I have a selected gene list (bed file) like this:

chr    start   end    strand    gene        geneLength
chr1    185217  195411  -   ENSG00000279457 10194
chr1    725885  778626  -   ENSG00000228327 52741
chr1    725885  778626  -   ENSG00000228327 52741
chr1    916865  921016  -   ENSG00000223764 4151
chr1    916865  921016  -   ENSG00000223764 4151
chr1    916865  921016  -   ENSG00000223764 4151
chr1    944204  959309  -   ENSG00000188976 15105

I get them after different downstream analysis, and I need to calculate the read depth for them (only for those). So far, I've tried this:

genomeCoverageBed -ibam bam -g gene.list > coverage

But the output doesn't show the coverage for the specific gene.

and I also tried this:

bedtools coverage -a gene.list -b bam > coverage

but it gave me this error

***** WARNING: bam has inconsistent naming convention for record:
GL000008.2  12857   12916   SRR6497147.21984058 0   -

Killed

Any help? Thanks!!!

coverage interesct readdepth • 147 views
ADD COMMENTlink written 3 months ago by Lila M 470

What are the chromosome identifiers in your bam?

ADD REPLYlink written 3 months ago by WouterDeCoster39k

Hi, the chromosomes appeared as "chr1"

SRR6497149.3455795  16  chr1    10533   1   65M *   0   0AAGTACCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGT  EGGGGFGGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGDGGGGGGGGFGGFGGGEBBBBB   AS:i:-10    XS:i:-10    XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:2  MD:Z:0G26C37    YT:Z:UU

however, there are additional chr that not start with "chr", as the one reported in the error. Is there any way in which I can remove those chr that are not the "typical"? Thanks!

ADD REPLYlink modified 3 months ago • written 3 months ago by Lila M 470
1

Hey Lila, for the first command, genomeCoverageBed, you don't have the correct input data. The file passed with -g needs to be of this format:

chr1    249250621
chr2    243199373
...
chr18_gl000207_random   4262

This is merely the first 2 columns of the output produced by samtools faidx.

What exactly are you aiming to do? Note that, in my pipeline (here https://github.com/kevinblighe/ClinicalGradeDNAseq ), I output various coverage statistics via:

number of reads off target

bedtools intersect -v -bed -abam "${BAMfile}" -b "${BEDfile}" | wc -l > ReadsOffTarget.txt

output depth of coverage for all regions in the BAM file

[Sequential positions at the same read depth are merged into a single region]

bedtools genomecov -ibam "${BAMfile}" -bga -split > CoverageTotal.bedgraph

output the per base read depth for each region in the BED file

bedtools coverage -a "${BEDfile}" -b "${BAMfile}" -d > PerBaseDepthBED.bedgraph

output the mean depth of coverage for each region in the BED file

bedtools coverage -a "${BEDfile}" -b "${BAMfile}" -mean > MeanCoverageBED.bedgraph
ADD REPLYlink modified 3 months ago • written 3 months ago by Kevin Blighe42k
1

Hi Kevin, Thank you for such an extensive response! What I basically did , was to transform my bam file into a bed one, so then I was able to get the coverage by running

bedtools coverage -a file.bed -b file.bedGraph -counts > coverage_by_counts
ADD REPLYlink written 3 months ago by Lila M 470
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1564 users visited in the last hour