Breakdancer Bam2Cfg.Pl , Coefficient Of Variation Is Too Large, Not Excluding Outliers?
2
0
Entering edit mode
10.7 years ago
William ★ 5.3k

When running breakdancer bam2cfg.pl I get the following error.

breakDancer_1.4.1/breakdancer/perl/bam2cfg.pl -C input_dedup_realigned.bam
Coefficient of variation 4.61403040060899 in library lib_ACI is larger than the cutoff 1, poor quality data, excluding from further analysis.

When I increase the coefficient of variant cutof to 5 I get the following insert size numbers / cutoffs.

breakDancer_1.4.1/breakdancer/perl/bam2cfg.pl -C -v 5 input_dedup_realigned.bam
lower:0.00      upper:275514842.99      mean:3215555.46 std:14836670.63 SWnormality:-74.09

These are of course not very useful.

On the exact same bam our inhouse tool gives the following mean and cutoffs

insertCutoff: 42      mean: 113     deletionCutoff: 253

Delly from the EMBL also gives the following median and deletionCutoff.

Median: 154    insert size cutoff: 279

Why do I get the strange insert size cutoffs with breakdancer bam2cfg? Doesn't it exclude outliers in the data ?

breakdancer • 4.5k views
ADD COMMENT
1
Entering edit mode
10.2 years ago

I had some issues when running bam2cfg.pl too with the default settings. Two changes helped.

I increased the min map quality to -q 40.

The other issue is that the software reads from the beginning of the bam file. Since the file is sorted by location, the script ends up reading the reads mapped to the beginning of the chromosome (telomere) which typically does not have great mapping quality. So modifying the samtools command within the module to start at a better location away from the end cleaned up the problems. These two changes result in the paired ends reads with very large insert sizes (due to faulty mapping) getting filtered out.

ADD COMMENT
0
Entering edit mode
10.6 years ago
William ★ 5.3k

just below this line in the bam2cfg.pl

next unless(($t->{flag}==18 || $t->{flag}==20) && $t->{dist}>=0);

I added a line to exclude al insert sizes longer than 10.000 bp from calculating the insert size cutoffs. These outliers are either based on really long fragments or on mapping artifacts because we have short (50 x 35 bp ) SOLiD PE reads. And I increased the number of pairs to read from 10.000 to 1.000.000 .

if($t->{dist}>10000 )  {      next;   }

I now get the following numbers from the bam2cfg.pl script which look more acceptable:

num:489203 lower:0.00       upper:340.33    mean:119.93    std:47.65       SWnormality:minus infinity
ADD COMMENT
0
Entering edit mode

This seems fine to me, but if I were you I would be concerned about the (apparently) large number of reads with large insert sizes in your BAM.

ADD REPLY

Login before adding your answer.

Traffic: 3148 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6