How To Interpret Breakdancer Output File ?
1
1
Entering edit mode
9.0 years ago

Hey there,

We want to identify CNV Gain and Loss from a Whole Genome Sequence (NGS). I have a list of question and I'm SORRY about that, but if someone would help me that would be really awesome. I'm really familiar with breakdancer and there's no help online, plus they don't answer emails ..

Breakdancer gave me the following output :

#Chr1    Pos1    Orient.1    Chr2    Pos2    Orient.2   Type  Size  Score num_Reads num_Reads_lib 1.bam    2.bam
chr1    4397163     2+0-    chr1    4397202        0+2-    DEL      87        45    2    /../2.bam|2        NA        2.61
chr1    7175079        10+0-    chr1    7335701        0+10-    DEL      160674    99    10    /../2.bam|10    NA        1.8
chr1    13204094    2+4-    chr1    13204123    2+4-    ITX      -68        56    2    /../2.bam|2        NA        NA
chr1    14047460    0+5-    chr1    14048120    10+8-    ITX      101        99    7    /../2.bam|7        NA        0.92
chr1    14047910    10+0-    chr1    14047926    3+6-    DEL      86        99    6    /../2.bam|6        NA        0.42
chr1    14048025    3+0-    chr1    14048090    0+3-    DEL      102        57    3    /../2.bam|3        NA        0.48
chr1    16253553    2+0-    chr1    16253604    0+2-    DEL      110        41    2    /../2.bam|2        NA        0.11
chr1    13468303    0+4-    chr1    20152466    0+6-    INV      6683996    99    4    /../2.bam|4        NA        2.17
chr1    14122049    0+15-    chr1    19398521    7+16-    INV      5276291    99    15    /../2.bam|15    NA        2.22
chr1    21193977    0+13-    chr1    21271500    0+13-    INV      77359        99    13    /../2.bam|13    NA        2.06
chr1    21199194    2+0-    chr1    21270748    2+0-    INV      71441        58    2    /../2.bam|2        NA        1.98
chr1    21199478    3+0-    chr1    21270418    3+0-    INV      70793        99    3    /../2.bam|3        NA        1.99
chr1    21200229    5+0-    chr1    21269693    5+0-    INV      69302        99    5    /../2.bam|5        NA        2.01

1. What's the difference between Pos1 and Pos2 ?
2. What is Orientation ?
3. What does type ITX mean ?
4. How Do I know CNV gain ??
5. The size doesn't equal to Pos2 - Pos1 .. then what size is that ?
6. What is Score ?
7. Is num_Reads the number of reads supporting this CNV ?
8. What is numReadslib ?
9. When it says 1.bam NA and 2.bam NA .. what does that mean ??
10. What is the meaning of the score in 1.bam and 2.bam ?

Thanks a lot for any contribution :)

breakdancer cnv • 11k views
3
Entering edit mode
9.0 years ago
Niek De Klein ★ 2.5k

I'm not sure if you haven't found the explanation of the output file, or if you don't understand what it says. If you haven't found the readme file, here's the link: https://github.com/kenchen/breakdancer#readme. I copied the relevant info for you:

The output format
----------------------
BreakDancer's output file consists of the following columns:

1. Chromosome 1
2. Position 1
3. Orientation 1
4. Chromosome 2
5. Position 2
6. Orientation 2
7. Type of a SV
8. Size of a SV
9. Confidence Score
10. Total number of supporting read pairs
11. Total number of supporting read pairs from each map file
12. Estimated allele frequency
13. Software version
14. The run parameters

Columns 1-3 and 4-6 are used to specify the coordinates of the two SV breakpoints. The orientation is a string that records the number of reads mapped to the plus (+) or the minus (-) strand in the anchoring regions.

Column 7 is the type of SV detected: DEL (deletions), INS (insertion), INV (inversion), ITX (intra-chromosomal translocation), CTX (inter-chromosomal translocation), and Unknown.
Column 8 is the size of the SV in bp.  It is meaningless for inter-chromosomal translocations.
Column 9 is the confidence score associated with the prediction.
Column 11 can be used to dissect the origin of the supporting read pairs, which is useful in pooled analysis.  For example, one may want to give SVs that are supported by more than one libraries higher confidence than those detected in only one library.  It can also be used to distinguish somatic events from the germline, i.e., those detected in only the tumor libraries versus those detected in both the tumor and the normal libraries.
Column 12 is currently a placeholder for displaying estimated allele frequency. The allele frequencies estimated in this version are not accurate and should not be trusted.
Column 13 and 14 are information useful to reproduce the results.

0
Entering edit mode

Indeed I wasn't aware of that Readme file .. silly me >< Thanks for saving my foolish life and Good Luck!