What does it mean to say "b-allele frequency"?
1
1
Entering edit mode
5.9 years ago
novice ★ 1.1k

I'm trying to work with Canvas to find Copy Number Variants in human data. I would appreciate if someone clarified what this input is supposed to be:

   --b-allele-vcf=VALUE   vcf containing SNV b-allele sites (only sites
with PASS in the filter column will be used)
(required)


I have called and filtered SNPs for my samples. Is this asking me to provide the set of SNPs (or SNP sites) that are flagged as having the alternate allele in the VCF file? If so, couldn't I just grep AB=1 and be good?

snp canvas • 6.6k views
1
Entering edit mode

See this manual for details:

http://biorxiv.org/content/biorxiv/suppl/2016/01/13/036194.DC2/036194-2.pdf

"Canvas supports a number of different workflows depending on the input sequencing data. The available modes are: 

## Germline

WGS: CNV calling of a diploid germline sample from whole genome sequen cing data 

## Somatic

Enrichment: CNV calling of a somatic sample from targeted sequencing data 

## Somatic

WGS: CNV calling of a somatic sample from whole genome sequencing data 

## normal

enrichment: CNV calling of a tumor/normal pair from targeted sequencing data"

0
Entering edit mode

Thank you, Natasha! Why didn't they just say "heterozygous sites" from the beginning?? I guess what I have to do then, for the purpose of this input file, is to grep for 0/1 or 1/0 SNPs.

0
Entering edit mode
5.6 years ago
jpflorido • 0

Hi! Did you find out the meaning of --b-allele-vcf? Is it related to the sample or to the normal/control? Thanks!

0
Entering edit mode

In this context, the b allele is the non-reference allele observed in a germline heterozygous SNP, i.e. in the normal/control sample. Since the tumor cells' DNA originally derived from normal cells' DNA, most of these SNPs will also be present in the tumor sample. But due to allele-specific copy number alterations, loss of heterozygosity or allelic imbalance, the allelic frequency of these SNPs may be different in the tumor, and that's evidence that one (or both) of the germline copies was gained or lost during tumor evolution.

So, filter for heterozygous genotypes in the normal sample, but keep the tumor sample in the VCF.

0
Entering edit mode

Hi Eric, I am interested in calculating the BAF (B-Allele Frequency) of tumor samples which do not have any matched normal sequenced. As you said, that "B-allele is the non-ref allele observed in a germline heterozygous SNP". I want to know how can I find out those germline heterozygous sites in my VCF and then calculate their BAF. The VCF was generated using Unified Genotyper from GATK. I have attached few records from VCF file. I am quite new to this. I will appreciate your response in this regard.

CHROM   POS     ID  REF ALT SCP2                        SCP3                       SCP43
>chr1   14522   .   G   A   0/1:107,12:119:15:15,0,439  0/1:101,12:114:99:111,0,712 0/1:76,9:86:28:28,0,365
> chr1  14542   .   A   G   0/1:115,11:126:16:16,0,535  0/1:110,13:123:94:94,0,722  0/1:71,11:82:37:37,0,302
> chr1  14574   .   A   G   0/0:122,8:130:46:0,46,888   0/1:93,10:103:57:57,0,731   0/1:72,12:84:30:30,0,521
> chr1  14653   rs375086259 C   T   0/1:131,30:162:99:372,0,1365    0/1:100,25:125:99:378,0,1436    0/1:81,23:104:99:227,0,1238
> chr1  14976   rs71252251  G   A   0/1:204,44:250:99:218,0,3516    0/1:223,27:250:5:5,0,3978   0/1:170,45:217:99:459,0,2776
> chr1  15688   .   C   T   0/1:35,16:52:66:66,0,166    0/1:12,8:20:25:59,0,25  0/1:18,9:27:3:3,0,189

0
Entering edit mode

If you don't have a matched normal to help distinguish germline variants from somatic, you can use dbSNP or 1000 Genomes to identify which of your variants are common SNPs. Then filter your VCF to retain only those, and use the output for BAF calculations in Canvas or CNVkit.