Question: Strange Depth of Coverage distribution
0
gravatar for melaniep
11 months ago by
melaniep0
Switzerland
melaniep0 wrote:

Hi,

After googleing for a while and not finding any hint, I would like to ask the experts here for their opinion. I have a very strange result for the distribution of the depth of coverage (see picture). It seems as there were two curves. Some samples more than others, some (not show. In total I have 20 samples, they are bee samples and were sequenced on Illumina HiSeq3000 (whole genome). I have been working with bee sequence data before and never saw this behaviour. Normally always more or less a smooth distribution. I wonder if somebody of you came across something similar or has an idea why there is this irregular distribution? I cannot think of something to explain this..

coverage distribution

sequencing genome • 480 views
ADD COMMENTlink modified 11 months ago • written 11 months ago by melaniep0

How to add images to a Biostars post

ADD REPLYlink written 11 months ago by genomax78k

thanks for the hint ;-)

ADD REPLYlink written 11 months ago by melaniep0

Were these libraries prepared in two batches/with different methods?

ADD REPLYlink written 11 months ago by genomax78k

No, should be just one library prep for each sample. (kit NEBNext Ultra II) But I can double check with the sequencing facility.

ADD REPLYlink written 11 months ago by melaniep0

This looks like a histogram plotting issue where it is binning the depth values in a skewed way.

ADD REPLYlink written 11 months ago by Damian Kao15k

I also checked the numbers directly from the output of GATKs DepthOfCoverage. And its already like this there, so should not be related to the plotting itself.

ADD REPLYlink written 11 months ago by melaniep0

Even if multiple samples were mixed together in the depth of cov reporting, it should not show this distribution. Multi-sample mix should show a smooth histogram with potentially multiple peaks. Maybe try using samtools stats' depth of coverage info for plotting and see if you get the same thing. Perhaps it is something to do with GATK.

ADD REPLYlink modified 11 months ago • written 11 months ago by Damian Kao15k

Hi, I had recalculated the depth with samtools, and it was the same. But Today it came to my mind why the coverage could be like this: I have overlapping pair-end sequencing reads! That would make sense, no?

ADD REPLYlink written 10 months ago by melaniep0

Should be easy enough to check that theory. I recommend using bbmerge.sh from BBMap suite for merge the reads. Do the merging on raw data (non-trimmed).

ADD REPLYlink written 10 months ago by genomax78k

Yeah that could be it. You can check this by looking at the insert size distribution of your PE reads. See if it is smaller than 2 * average read length. The 9th column of your .sam/.bam should be the insert size.

ADD REPLYlink modified 10 months ago • written 10 months ago by Damian Kao15k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 709 users visited in the last hour