Question: Breakdancer Bam2Cfg.Pl , Coefficient Of Variation Is Too Large, Not Excluding Outliers?
gravatar for William
5.9 years ago by
William4.4k wrote:

When running breakdancer I get the following error.

breakDancer_1.4.1/breakdancer/perl/ -C input_dedup_realigned.bam
Coefficient of variation 4.61403040060899 in library lib_ACI is larger than the cutoff 1, poor quality data, excluding from further analysis.

When I increase the coefficient of variant cutof to 5 I get the following insert size numbers / cutoffs.

breakDancer_1.4.1/breakdancer/perl/ -C -v 5 input_dedup_realigned.bam
lower:0.00      upper:275514842.99      mean:3215555.46 std:14836670.63 SWnormality:-74.09

These are of course not very useful.

On the exact same bam our inhouse tool gives the following mean and cutoffs

insertCutoff: 42      mean: 113     deletionCutoff: 253

Delly from the EMBL also gives the following median and deletionCutoff.

Median: 154    insert size cutoff: 279

Why do I get the strange insert size cutoffs with breakdancer bam2cfg? Doesn't it exclude outliers in the data ?

breakdancer • 2.7k views
ADD COMMENTlink modified 3.1 years ago by Biostar ♦♦ 20 • written 5.9 years ago by William4.4k
gravatar for john.joseph.farrell
5.5 years ago by
john.joseph.farrell20 wrote:

I had some issues when running too with the default settings. Two changes helped.

I increased the min map quality to -q 40.

The other issue is that the software reads from the beginning of the bam file. Since the file is sorted by location, the script ends up reading the reads mapped to the beginning of the chromosome (telomere) which typically does not have great mapping quality. So modifying the samtools command within the module to start at a better location away from the end cleaned up the problems. These two changes result in the paired ends reads with very large insert sizes (due to faulty mapping) getting filtered out.

ADD COMMENTlink written 5.5 years ago by john.joseph.farrell20
gravatar for William
5.9 years ago by
William4.4k wrote:

just below this line in the

next unless(($t->{flag}==18 || $t->{flag}==20) && $t->{dist}>=0);

I added a line to exclude al insert sizes longer than 10.000 bp from calculating the insert size cutoffs. These outliers are either based on really long fragments or on mapping artifacts because we have short (50 x 35 bp ) SOLiD PE reads. And I increased the number of pairs to read from 10.000 to 1.000.000 .

if($t->{dist}>10000 )  {      next;   }

I now get the following numbers from the script which look more acceptable:

num:489203 lower:0.00       upper:340.33    mean:119.93    std:47.65       SWnormality:minus infinity
ADD COMMENTlink written 5.9 years ago by William4.4k

This seems fine to me, but if I were you I would be concerned about the (apparently) large number of reads with large insert sizes in your BAM.

ADD REPLYlink written 5.9 years ago by ernfrid380
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1541 users visited in the last hour