Understanding flags from output of bam2cfg.pl - BreakDancer
1
2
Entering edit mode
8.8 years ago

I followed the instructions from BreakDancer site. After creating .cfg file I examined it and I saw that I have much higher % of flag 32 than suggested.

Manually view the insert size and flag distribution results in BRC6.cfg to see if there are any data quality issue. Usually std/mean should be < 0.2 or 0.3 at most. The flag 32(x%), represents percent of chimeric insert, this number (x%) should usually be smaller than 3%.

readgroup:MinXX    platform:Illumina    map:/store/files/017/dataset_17953.dat    readlen:50.40    lib:MinXX    num:8466    lower:639.24    upper:4213.61    mean:1906.08    std:446.16    SWnormality:-72.15    flag:0(6.94%)1(0.43%)18(0.07%)2(14.75%)20(57.10%)32(18.86%)4(0.77%)64(0.65%)8(0.44%)30001    exe:samtools view
readgroup:MinXY    platform:Illumina    map:/store/files/017/dataset_17954.dat    readlen:50.35    lib:MinXY    num:8508    lower:795.35    upper:4325.25    mean:1956.59    std:440.27    SWnormality:-77.06    flag:0(4.22%)1(0.50%)18(0.06%)2(15.29%)20(59.11%)32(19.03%)4(0.77%)64(0.67%)8(0.35%)30001    exe:samtools view

The questions regarding this are:

  1. Is there any documentation where I can find the meaning of .cfg flags?
  2. What is "chimeric insert"? (Is it the same as chimeric reads?)
  3. What would be the best way to get rid of chimeric inserts?
  4. Would it be possible that divergence from reference (estimated nucleotide divergence in this case is around 4-5%) is responsible for such a high percentage of chimeric inserts?

Thanks!

bam2cfg flag breakdancer • 2.5k views
ADD COMMENT
1
Entering edit mode
8.8 years ago
ernfrid ▴ 220

The supported version of breakdancer is now on github: https://github.com/genome/breakdancer

That said, the flag values are not documented anywhere that I'm aware of. The below assume you are working with standard Illumina libraries and are BreakDancer's default interpretation. I've tried to create tiny ascii diagrams of the orientations to help clarify.

Here are some definitions:

0 - Marked as duplicate or a single-ended read

1 - The aligner doesn't report the read as properly paired and both the read and its mate are mapped to the plus strand (e.g. --> -->)

2 - The aligner doesn't report the read as properly paired, but the orientation is as expected (e.g. --> <--)

4 - The aligner doesn't report the read as properly paired and the larger coordinate read is not on the negative strand (e.g. <-- -->)

8 - The aligner doesn't report the read as properly paired and both the read and its mate are mapped to the minus strand (e.g. <-- <--)

18 - The aligner reports the read as properly paired and the leftmost read is on the plus strand (e.g. --> <--)

20 - The aligner reports the read as properly paired and the leftmost read is on the minus strand (e.g. <-- -->)

64 - read's mate is unmapped

192 - read is unmapped

ADD COMMENT

Login before adding your answer.

Traffic: 2526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6