Question: Strange Depth of Coverage distribution
0
gravatar for melaniep
12 weeks ago by
melaniep0
Switzerland
melaniep0 wrote:

Hi,

After googleing for a while and not finding any hint, I would like to ask the experts here for their opinion. I have a very strange result for the distribution of the depth of coverage (see picture). It seems as there were two curves. Some samples more than others, some (not show. In total I have 20 samples, they are bee samples and were sequenced on Illumina HiSeq3000 (whole genome). I have been working with bee sequence data before and never saw this behaviour. Normally always more or less a smooth distribution. I wonder if somebody of you came across something similar or has an idea why there is this irregular distribution? I cannot think of something to explain this..

coverage distribution

sequencing genome • 234 views
ADD COMMENTlink modified 12 weeks ago • written 12 weeks ago by melaniep0

How to add images to a Biostars post

ADD REPLYlink written 12 weeks ago by genomax68k

thanks for the hint ;-)

ADD REPLYlink written 12 weeks ago by melaniep0

Were these libraries prepared in two batches/with different methods?

ADD REPLYlink written 12 weeks ago by genomax68k

No, should be just one library prep for each sample. (kit NEBNext Ultra II) But I can double check with the sequencing facility.

ADD REPLYlink written 12 weeks ago by melaniep0

This looks like a histogram plotting issue where it is binning the depth values in a skewed way.

ADD REPLYlink written 12 weeks ago by Damian Kao15k

I also checked the numbers directly from the output of GATKs DepthOfCoverage. And its already like this there, so should not be related to the plotting itself.

ADD REPLYlink written 12 weeks ago by melaniep0

Even if multiple samples were mixed together in the depth of cov reporting, it should not show this distribution. Multi-sample mix should show a smooth histogram with potentially multiple peaks. Maybe try using samtools stats' depth of coverage info for plotting and see if you get the same thing. Perhaps it is something to do with GATK.

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by Damian Kao15k

Hi, I had recalculated the depth with samtools, and it was the same. But Today it came to my mind why the coverage could be like this: I have overlapping pair-end sequencing reads! That would make sense, no?

ADD REPLYlink written 11 weeks ago by melaniep0

Should be easy enough to check that theory. I recommend using bbmerge.sh from BBMap suite for merge the reads. Do the merging on raw data (non-trimmed).

ADD REPLYlink written 11 weeks ago by genomax68k

Yeah that could be it. You can check this by looking at the insert size distribution of your PE reads. See if it is smaller than 2 * average read length. The 9th column of your .sam/.bam should be the insert size.

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by Damian Kao15k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 870 users visited in the last hour