How to create B-allele frequency file for ADTEx?
1
2
Entering edit mode
7.3 years ago
Dataman ▴ 350

Hi,

I am using ADTEx to perform copy number analysis of some exome sequencing data. ADTEx requires a B-allele frequency file in order to perform ploidy estimation and genotype prediction. However, I do not know how to create such file and the ADTEx tutorial does not explain it either. I was wondering if there is a general way to create B-allele frequency files that I do not know of!

The file as is explained in the tutorial should have the following fields:

• chrom - chromosome name (same format as in BED or BAM file)
• SNP_loc - location of the SNP
• control_BAF - B allele frequency (BAF) at each SNP in control sample
• tumor_BAF - B allele frequency (BAF) at each SNP in tumor sample
• control_doc - Total read count at each SNP in control sample
• tumor_doc - Total read count at each SNP in tumour sample

I would like to thank you in advance for your responses and wishing you all a nice summer.

next-gen sequencing exome adtex • 4.3k views
0
Entering edit mode

Hi, I want to use ADTEx as well but how do I create or where do I get the target definition file? Thanks for your help

1
Entering edit mode

This set of probes was made by Krumm et al (see: http://conifer.sourceforge.net/tutorial.html ). Ideally you need to make your own set of probes and for that you need to know how the Exome sequencing was done i.e. which kit was used for pulling down the targets and performing the sequencing. Hope this is helpful! :)

2
Entering edit mode
7.3 years ago
russhh 5.6k

Here's what I do:

• Compute germline heterozygous SNPs for all patients using GATK::HaplotypeCaller (all patients provided to a single run of GATK, that is; though for speed I split this up by chromosome then merge back together)
• Extract patient-specific .vcf of heterozygous SNPs from the latter
• Convert the .vcf to a .bed for each patient
• Generate a samtools pileup for both the tumour and normal sample using the latter .bed file
• Run Varscan somatic --validate over the two pileups (outputting in varscan native format)
• Then I convert a dataframe of the varscan data into a .baf dataframe using the following
def snps_to_baf(
vscan_data,
drop_if_no_alt_in_normal = False,
drop_N_refs = False
):
"""
Takes VarScan somatic snp data (as a Dataframe) and computes the depth of coverage
and b-allele fractions at each site therein. Returns a pandas dataframe of the same
form as in the example file provided by ADTEx
That is,
chrom    SNP_loc    control_BAF    tumor_BAF    control_doc    tumor_doc

For use in ADTex, I suggest setting drop_if_no_alt_in_normal and drop_N_refs to True
"""
baf_cols = ['chrom', 'SNP_loc', 'control_BAF', 'tumor_BAF', 'control_doc',
'tumor_doc']
empty_baf = pd.DataFrame({k : [] for k in baf_cols}, columns = baf_cols)

if drop_if_no_alt_in_normal:
# keep rows that have at least one read supporting the alt allele
# in the normal sample (if specified by the user)

if drop_N_refs:
# drop all rows that have 'N' as the reference allele
vscan_data = vscan_data[vscan_data.ref != 'N']

if len(vscan_data) == 0:
return empty_baf
#
baf_data = pd.DataFrame({
'chrom'       : vscan_data['chrom'],
'SNP_loc'     : vscan_data['position'],
),
),
},
columns = baf_cols
)
return baf_data

1
Entering edit mode

obviously, you need python / pandas etc for this to work

0
Entering edit mode

@russ_hyde: Thank so much for the comprehensive answer. I will give it a try asap! :)