Read depth maxing out at about 8000
2
1
Entering edit mode
3.4 years ago
ben83 ▴ 50

Intro: Yeast chromosome 12 contains a really long region of rDNA. There are about 150-200 or greater copies of a 9.1kb repeat, so about 1.5M bases in total, which is the majority of the chromosome. The published yeast reference genome has only 2 copies of the repeat.

My situation: I just did whole genome sequencing of some yeast clones. I have on about 70X coverage in most of the genome. If you figure that there are 200 copies of the rDNA repeat on chromosome 12, but only 2 copies represented in the reference sequence, then in this region I should see a read depth of roughly 70*100 = 7000. And in fact, this is pretty much what I see, here's a plot of read depth in that area. To generate this plot I mapped reads with bwa and then used samtools depth and then plotted in R.

enter image description here

My question: The plot looks obviously artificially cut off at a read depth of ~8000--why? I didn't specify any limits in creating the plot. So where is the cutoff happening? Is it something samtools depth is doing? Something bwa did? Or something about how the reads are generated during Illumina sequencing?

Last thing to mentions: the highest value is not exactly 8000, there are a few positions with depths of 8002, just to make things weirder. And I've shown just one example, but it happens on to all clones I sequenced.

Thanks for any help!

sequencing genome read-depth • 2.2k views
ADD COMMENT
3
Entering edit mode
3.4 years ago
ATpoint 81k
Usage: samtools depth [options] in1.bam [in2.bam [...]]
Options:
  (...)

   -d/-m <int>         maximum coverage depth [8000]. If 0, depth is set to the maximum
                       integer value, effectively removing any depth limit.
  (...)
ADD COMMENT
0
Entering edit mode

Aha!

ADD REPLY
0
Entering edit mode

-d 0 worked! Thanks!

enter image description here

ADD REPLY
1
Entering edit mode
3.4 years ago

it's not an issue with your data, it's the tools. Counting depth costs memory, since the reads have to be kept around long enough to see how many add up on each location. So there's a max depth, capped to 8K by samtools depth. There's a config flag to set the max to other thigns, but it was broken when I last used samtools. It might work for you or you might need something other than samtools depth. maybe RSubread or htslib can help.

ADD COMMENT
0
Entering edit mode

Luckily, it appears to not be broken! Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1980 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6