Question: bedtools genomecov max function not working correctly
0
gravatar for RBright21
4 days ago by
RBright2110
RBright2110 wrote:

When using bedtools genomecov max function I get values reported for some chromosomes/reference sequences which are above the max depth set. This is not consistent between files and always occurs on the last base or two of the reference genome. The number of chromosomes affected varies between bam files.

For example using code:

bedtools genomecov -ibam In.bam -bga -max 1 > out.txt

for this particular file I got all 734562 lines reported as either 0 or 1 (covered vs uncovered) with the exception of 7 lines:

NC_011203.1 9997    9998    1
NC_011203.1 9998    9999    1
NC_011203.1 9999    10000   1
MH558113.1  35932   35933   7
AC_000008.1 35937   35938   8
NC_001405.1 35936   35937   8
AC_000017.1 36000   36001   9
HQ003817.1  35817   35818   9
HQ413315.1  35758   35759   9
MH121097.1  35750   35752   92

I have looked at the bam on IGV and there are 92 aligned reads at bases 35750 to 35752 for MH121097.1and the reference genome is 35752bp in length.

Does anyone know how I can fix this or where I can get some help? Should I report as an issue on GitHub (I was wary of doing so incase It was something I was doing wrong)?

For context: my bam file contains reads aligned against 104 different reference sequences using bowtie2 all. I am trying to find a way to write a script which could easily tell me which of these reference genomes are completely covered by my reads to look for mixed infections without having to do it manually. I think the way I am trying to do it is quite convoluted though and I may be missing a simpler solution. My steps are:

  1. bedtools genomecov -ibam In.bam -bga -max 1 > out.txt
  2. awk -F"\t" '!seen[$1, $4]++' out.txt > awk.txt awk to keep lines which are unique in both the chromosome name and the depth - so keep chromosome name with coverage =0 and chromosome name with coverage =1 meaning I am left with one instance of covered and one instance of an area uncovered
  3. awk '{print $1}' awk.txt > awk2.txt remove all columns except the chromosome name
  4. uniq -u awk2.txt > uniq.txt use uniq to identify unique chromosome names (those which only have an entry with a 1 representing entirely covered genomes)

This works very well when I don't get these odd max depths which are over the defined max value of 1 but if anyone can think of a simpler way I can achieve this then I would also be very grateful for your help

Thanks

genomecov assembly bedtools • 84 views
ADD COMMENTlink modified 4 days ago by ATpoint45k • written 4 days ago by RBright2110
0
gravatar for ATpoint
4 days ago by
ATpoint45k
ATpoint45k wrote:
-max        Combine all positions with a depth >= max into
            a single bin in the histogram. Irrelevant
            for -d and -bedGraph
            - (INTEGER)

Irrelevant for the bedGraoh setting, only relevant for the histogram settings => does not apply for your -bga setting.

ADD COMMENTlink written 4 days ago by ATpoint45k

Thank you - I hadn't noticed that bit. It does work though - completely on some of my bam files with the same parameters set and across the majority of the rest of the files with the exception of the last base or two though? Surely if it didn't work for those options then my files would be reporting their actual coverage which is significantly higher than 1. Am I missing something?

ADD REPLYlink written 4 days ago by RBright2110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1499 users visited in the last hour
_