Question: Breakdancer Bam2Cfg.Pl , Coefficient Of Variation Is Too Large, Not Excluding Outliers?
0
gravatar for William
4.1 years ago by
William3.9k
Europe
William3.9k wrote:

When running breakdancer bam2cfg.pl I get the following error.

breakDancer_1.4.1/breakdancer/perl/bam2cfg.pl -C input_dedup_realigned.bam
Coefficient of variation 4.61403040060899 in library lib_ACI is larger than the cutoff 1, poor quality data, excluding from further analysis.

When I increase the coefficient of variant cutof to 5 I get the following insert size numbers / cutoffs.

breakDancer_1.4.1/breakdancer/perl/bam2cfg.pl -C -v 5 input_dedup_realigned.bam
lower:0.00      upper:275514842.99      mean:3215555.46 std:14836670.63 SWnormality:-74.09

These are of course not very useful.

On the exact same bam our inhouse tool gives the following mean and cutoffs

insertCutoff: 42      mean: 113     deletionCutoff: 253

Delly from the EMBL also gives the following median and deletionCutoff.

Median: 154    insert size cutoff: 279

Why do I get the strange insert size cutoffs with breakdancer bam2cfg? Doesn't it exclude outliers in the data ?

breakdancer • 1.8k views
ADD COMMENTlink modified 15 months ago by Biostar ♦♦ 20 • written 4.1 years ago by William3.9k
1
gravatar for john.joseph.farrell
3.6 years ago by
john.joseph.farrell20 wrote:

I had some issues when running bam2cfg.pl too with the default settings. Two changes helped.

I increased the min map quality to -q 40.

The other issue is that the software reads from the beginning of the bam file. Since the file is sorted by location, the script ends up reading the reads mapped to the beginning of the chromosome (telomere) which typically does not have great mapping quality. So modifying the samtools command within the module to start at a better location away from the end cleaned up the problems. These two changes result in the paired ends reads with very large insert sizes (due to faulty mapping) getting filtered out.

ADD COMMENTlink written 3.6 years ago by john.joseph.farrell20
0
gravatar for William
4.0 years ago by
William3.9k
Europe
William3.9k wrote:

just below this line in the bam2cfg.pl

next unless(($t->{flag}==18 || $t->{flag}==20) && $t->{dist}>=0);

I added a line to exclude al insert sizes longer than 10.000 bp from calculating the insert size cutoffs. These outliers are either based on really long fragments or on mapping artifacts because we have short (50 x 35 bp ) SOLiD PE reads. And I increased the number of pairs to read from 10.000 to 1.000.000 .

if($t->{dist}>10000 )  {      next;   }

I now get the following numbers from the bam2cfg.pl script which look more acceptable:

num:489203 lower:0.00       upper:340.33    mean:119.93    std:47.65       SWnormality:minus infinity
ADD COMMENTlink written 4.0 years ago by William3.9k

This seems fine to me, but if I were you I would be concerned about the (apparently) large number of reads with large insert sizes in your BAM.

ADD REPLYlink written 4.0 years ago by ernfrid380
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1206 users visited in the last hour