I am using ADTEx to perform copy number analysis of some exome sequencing data. ADTEx requires a B-allele frequency file in order to perform ploidy estimation and genotype prediction. However, I do not know how to create such file and the ADTEx tutorial does not explain it either. I was wondering if there is a general way to create B-allele frequency files that I do not know of!
The file as is explained in the tutorial should have the following fields:
chrom - chromosome name (same format as in BED or BAM file)
SNP_loc - location of the SNP
control_BAF - B allele frequency (BAF) at each SNP in control sample
tumor_BAF - B allele frequency (BAF) at each SNP in tumor sample
control_doc - Total read count at each SNP in control sample
tumor_doc - Total read count at each SNP in tumour sample
I would like to thank you in advance for your responses and wishing you all a nice summer.
This set of probes was made by Krumm et al (see: http://conifer.sourceforge.net/tutorial.html ). Ideally you need to make your own set of probes and for that you need to know how the Exome sequencing was done i.e. which kit was used for pulling down the targets and performing the sequencing. Hope this is helpful! :)
ADD REPLY
• link
updated 4.7 years ago by
Ram
44k
•
written 8.7 years ago by
Dataman
▴
380
Compute germline heterozygous SNPs for all patients using GATK::HaplotypeCaller (all patients provided to a single run of GATK, that is; though for speed I split this up by chromosome then merge back together)
Extract patient-specific .vcf of heterozygous SNPs from the latter
Convert the .vcf to a .bed for each patient
Generate a samtools pileup for both the tumour and normal sample using the latter .bed file
Run Varscan somatic --validate over the two pileups (outputting in varscan native format)
Then I convert a dataframe of the varscan data into a .baf dataframe using the following
def snps_to_baf(
vscan_data,
drop_if_no_alt_in_normal = False,
drop_N_refs = False
):
"""
Takes VarScan somatic snp data (as a Dataframe) and computes the depth of coverage
and b-allele fractions at each site therein. Returns a pandas dataframe of the same
form as in the example file provided by ADTEx
That is,
chrom SNP_loc control_BAF tumor_BAF control_doc tumor_doc
For use in ADTex, I suggest setting drop_if_no_alt_in_normal and drop_N_refs to True
"""
baf_cols = ['chrom', 'SNP_loc', 'control_BAF', 'tumor_BAF', 'control_doc',
'tumor_doc']
empty_baf = pd.DataFrame({k : [] for k in baf_cols}, columns = baf_cols)
if drop_if_no_alt_in_normal:
# keep rows that have at least one read supporting the alt allele
# in the normal sample (if specified by the user)
vscan_data = vscan_data[vscan_data.normal_reads2 > 0]
if drop_N_refs:
# drop all rows that have 'N' as the reference allele
vscan_data = vscan_data[vscan_data.ref != 'N']
if len(vscan_data) == 0:
return empty_baf
#
baf_data = pd.DataFrame({
'chrom' : vscan_data['chrom'],
'SNP_loc' : vscan_data['position'],
'control_BAF' : vscan_data['normal_reads2'] / (
vscan_data['normal_reads1'] + vscan_data['normal_reads2']
),
'tumor_BAF' : vscan_data['tumor_reads2'] / (
vscan_data['tumor_reads1'] + vscan_data['tumor_reads2']
),
'control_doc' : vscan_data['normal_reads1'] + vscan_data['normal_reads2'],
'tumor_doc' : vscan_data['tumor_reads1'] + vscan_data['tumor_reads2']
},
columns = baf_cols
)
return baf_data
Hi, I want to use ADTEx as well but how do I create or where do I get the target definition file? Thanks for your help
Hi, sorry for the late reply! Here is a link for a standard probe file: http://sourceforge.net/projects/conifer/files/probes.txt/download
This set of probes was made by Krumm et al (see: http://conifer.sourceforge.net/tutorial.html ). Ideally you need to make your own set of probes and for that you need to know how the Exome sequencing was done i.e. which kit was used for pulling down the targets and performing the sequencing. Hope this is helpful! :)