I followed the instructions from BreakDancer site. After creating .cfg file I examined it and I saw that I have much higher % of flag 32 than suggested.
Manually view the insert size and flag distribution results in BRC6.cfg to see if there are any data quality issue. Usually std/mean should be < 0.2 or 0.3 at most. The flag 32(x%), represents percent of chimeric insert, this number (x%) should usually be smaller than 3%.
readgroup:MinXX platform:Illumina map:/store/files/017/dataset_17953.dat readlen:50.40 lib:MinXX num:8466 lower:639.24 upper:4213.61 mean:1906.08 std:446.16 SWnormality:-72.15 flag:0(6.94%)1(0.43%)18(0.07%)2(14.75%)20(57.10%)32(18.86%)4(0.77%)64(0.65%)8(0.44%)30001 exe:samtools view
readgroup:MinXY platform:Illumina map:/store/files/017/dataset_17954.dat readlen:50.35 lib:MinXY num:8508 lower:795.35 upper:4325.25 mean:1956.59 std:440.27 SWnormality:-77.06 flag:0(4.22%)1(0.50%)18(0.06%)2(15.29%)20(59.11%)32(19.03%)4(0.77%)64(0.67%)8(0.35%)30001 exe:samtools view
The questions regarding this are:
- Is there any documentation where I can find the meaning of .cfg flags?
- What is "chimeric insert"? (Is it the same as chimeric reads?)
- What would be the best way to get rid of chimeric inserts?
- Would it be possible that divergence from reference (estimated nucleotide divergence in this case is around 4-5%) is responsible for such a high percentage of chimeric inserts?
Thanks!