Question: What does it mean to say "b-allele frequency"?
1
gravatar for novice
3.4 years ago by
novice950
United States
novice950 wrote:

I'm trying to work with Canvas to find Copy Number Variants in human data. I would appreciate if someone clarified what this input is supposed to be:

   --b-allele-vcf=VALUE   vcf containing SNV b-allele sites (only sites 
                               with PASS in the filter column will be used) 
                               (required)

I have called and filtered SNPs for my samples. Is this asking me to provide the set of SNPs (or SNP sites) that are flagged as having the alternate allele in the VCF file? If so, couldn't I just grep AB=1 and be good?

snp canvas • 4.3k views
ADD COMMENTlink modified 3.1 years ago by jpflorido0 • written 3.4 years ago by novice950
1

See this manual for details:

http://biorxiv.org/content/biorxiv/suppl/2016/01/13/036194.DC2/036194-2.pdf

"Canvas supports a number of different workflows depending on the input sequencing data. The available modes are: 

Germline

WGS: CNV calling of a diploid germline sample from whole genome sequen cing data 

Somatic

Enrichment: CNV calling of a somatic sample from targeted sequencing data 

Somatic

WGS: CNV calling of a somatic sample from whole genome sequencing data 

Tumor

normal

enrichment: CNV calling of a tumor/normal pair from targeted sequencing data"

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by natasha.sernova3.7k

Thank you, Natasha! Why didn't they just say "heterozygous sites" from the beginning?? I guess what I have to do then, for the purpose of this input file, is to grep for 0/1 or 1/0 SNPs.

ADD REPLYlink written 3.4 years ago by novice950
0
gravatar for jpflorido
3.1 years ago by
jpflorido0
jpflorido0 wrote:

Hi! Did you find out the meaning of --b-allele-vcf? Is it related to the sample or to the normal/control? Thanks!

ADD COMMENTlink written 3.1 years ago by jpflorido0

In this context, the b allele is the non-reference allele observed in a germline heterozygous SNP, i.e. in the normal/control sample. Since the tumor cells' DNA originally derived from normal cells' DNA, most of these SNPs will also be present in the tumor sample. But due to allele-specific copy number alterations, loss of heterozygosity or allelic imbalance, the allelic frequency of these SNPs may be different in the tumor, and that's evidence that one (or both) of the germline copies was gained or lost during tumor evolution.

So, filter for heterozygous genotypes in the normal sample, but keep the tumor sample in the VCF.

ADD REPLYlink written 3.1 years ago by Eric T.2.6k

Hi Eric, I am interested in calculating the BAF (B-Allele Frequency) of tumor samples which do not have any matched normal sequenced. As you said, that "B-allele is the non-ref allele observed in a germline heterozygous SNP". I want to know how can I find out those germline heterozygous sites in my VCF and then calculate their BAF. The VCF was generated using Unified Genotyper from GATK. I have attached few records from VCF file. I am quite new to this. I will appreciate your response in this regard.

CHROM   POS     ID  REF ALT SCP2                        SCP3                       SCP43
>chr1   14522   .   G   A   0/1:107,12:119:15:15,0,439  0/1:101,12:114:99:111,0,712 0/1:76,9:86:28:28,0,365
> chr1  14542   .   A   G   0/1:115,11:126:16:16,0,535  0/1:110,13:123:94:94,0,722  0/1:71,11:82:37:37,0,302
> chr1  14574   .   A   G   0/0:122,8:130:46:0,46,888   0/1:93,10:103:57:57,0,731   0/1:72,12:84:30:30,0,521
> chr1  14653   rs375086259 C   T   0/1:131,30:162:99:372,0,1365    0/1:100,25:125:99:378,0,1436    0/1:81,23:104:99:227,0,1238
> chr1  14976   rs71252251  G   A   0/1:204,44:250:99:218,0,3516    0/1:223,27:250:5:5,0,3978   0/1:170,45:217:99:459,0,2776
> chr1  15688   .   C   T   0/1:35,16:52:66:66,0,166    0/1:12,8:20:25:59,0,25  0/1:18,9:27:3:3,0,189
ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by AISHA80

Some details here in the CNVkit documentation: - https://cnvkit.readthedocs.io/en/stable/baf.html - https://cnvkit.readthedocs.io/en/stable/fileformats.html#vcf

If you don't have a matched normal to help distinguish germline variants from somatic, you can use dbSNP or 1000 Genomes to identify which of your variants are common SNPs. Then filter your VCF to retain only those, and use the output for BAF calculations in Canvas or CNVkit.

ADD REPLYlink written 2.2 years ago by Eric T.2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1873 users visited in the last hour