Is this range of VAF is ordinary?
0
2
Entering edit mode
21 months ago
A ★ 4.0k

Hi

I have calculated Variant Allele Frequency (VAF) for called SNVs and INDELs called by Strelka separately. For getting VAF, I done

VAF = Tumour Variant Allele Count / Tumour Read Count

For some position of the genome I have VAF > 30 , so DOES these big VAFs are normal or I am doing something wrong?

I was supposing VAFs should be in the range of 0 < VAF <1

Can you help in getting some idea?

Thanks

WGS VCF VAF • 624 views
ADD COMMENT
0
Entering edit mode

I was supposing VAFs should be in the range of 0 < VAF <1

Yes. I don't see how VAF can be >1. There's something wrong in your calculations.

ADD REPLY
0
Entering edit mode

Did you follow the instructions to calculate VAFs as suggested in one of your previous posts (=as in the Strelka manual)? I guess not given this result.

ADD REPLY
1
Entering edit mode

Actually somebody wrote a script for me; Assuming a Strelka .vcf for SNV

#Basic information
chrom=$(echo "$line" | sed 's/ /\t/g' | cut -f 1) #&& echo $chrom;
Pos=$(echo "$line" | sed 's/ /\t/g' |  cut -f 2) #&& echo $Pos;
Ref=$(echo "$line" | sed 's/ /\t/g' | cut -f 4)
Alt=$(echo "$line" | sed 's/ /\t/g' | cut -f 5)

#Tumor sample read, variant and reference information
ReadCount=$(echo "$line" | cut -f 8 | sed 's/;/\t/g' | cut -f 13 | sed 's/ReadCount=//' )
VariantAlleleCount=$(echo "$line" | cut -f 8 | sed 's/;/\t/g' | cut -f 26| sed 's/VariantAlleleCount=//')
ReferenceAlleleCount=$(echo $ line | awk -v rc="$ReadCount" -v vac="$VariantAlleleCount" '{print rc-vac}')


#Control or Normal read, variant, reference information
ReadCountControl=$(echo "$line" | cut -f 8 | sed 's/;/\t/g' | cut -f 14 | sed 's/ReadCountControl=//')
VariantAlleleCountControl=$(echo "$line" | cut -f 8 | sed 's/;/\t/g' | cut -f 27 | sed 's/VariantAlleleCountControl=//')   

ReferenceAlleleCountControl=$(echo "$line" | awk -v rcc="$ReadCountControl" -v vacc="$VariantAlleleCountControl" '{print rcc-vacc}')


VAF=$(echo "$line" | cut -f 8 | sed 's/;/\t/g' | cut -f 28 | sed 's/VariantAlleleFrequency=//')
ADD REPLY
0
Entering edit mode

Somebody wrote a script for me

No wonder you're running into problems you can't explain. You can execute each statement line by line and see where the logic goes awry, or you can contact the author and hope they have the time to explain what could be going wrong. I'd recommend the former approach.

ADD REPLY
0
Entering edit mode

Please read the Strelka manual towards calculating AFs. The way one calculates this is different for Indels and SNPs, so running one script (the one below) on both is not going to work. For Indels you need the TIR and TAR values while for SNPs you will have to extract something different, I do not remember. I fugired it out back in the day entirely by reading the manual, I am sure you can do that as well.

ADD REPLY
1
Entering edit mode

For INDELs he has written

#Basic information
chrom=$(echo $line | sed 's/ /\t/g' | cut -f 1) #&& echo $chrom;
Pos=$(echo $line | sed 's/ /\t/g' |  cut -f 2) #&& echo $Pos;
Ref=$(echo $line | sed 's/ /\t/g' | cut -f 4)
Alt=$(echo $line | sed 's/ /\t/g' | cut -f 5)

#Tumor sample read, variant and reference information (I used TIER 1)
ReadCountTumor=$(echo $line | sed 's/ /\t/g' | cut -f 11 | sed 's/:/\t/g' | cut -f 1)
SupportAltAlleleTumor=$(echo $line | sed 's/ /\t/g' | cut -f 11 | sed 's/:/\t/g' |  cut -f 3 | sed 's/,/\t/g' | cut -f 1)
SupportIndelTumor=$(echo $line | sed 's/ /\t/g' | cut -f 11 | sed 's/:/\t/g' | cut -f 4 | sed 's/,/\t/g' | cut -f 1)

#OMMITTEDSupportOtherTumor=$(echo $line | sed 's/ /\t/g' | cut -f 10 | sed 's/:/\t/g' | cut -f 5 | sed 's/,/\t/g' | cut -f 1)


AltAlleleFrequencyTumor=$(echo $line | sed 's/ /\t/g' | awk -v RCT=$ReadCountTumor -v AltAlleleTumor=$SupportAltAlleleTumor '{print AltAlleleTumor/RCT}')                                                                         
IndelFrequencyTumor=$(echo $line | sed 's/ /\t/g' | awk -v RCT=$ReadCountTumor -v AltINDELTumor=$SupportIndelTumor '{print AltINDELTumor/RCT}')  



#Control or Normal read, variant, reference informationi (I used TIER 1)
ReadCountControl=$(echo $line | sed 's/ /\t/g' | cut -f 10 | sed 's/:/\t/g' | cut -f 1)
SupportAltAlleleControl=$(echo $line | sed 's/ /\t/g' | cut -f 10 | sed 's/:/\t/g' | cut -f 3 | sed 's/,/\t/g' | cut -f 1) 
SupportIndelControl=$(echo $line | sed 's/ /\t/g' | cut -f 10 | sed 's/:/\t/g' | cut -f 4 | sed 's/,/\t/g' | cut -f 1)

#OMMITTED##SupportOtherControl=$(echo $line | sed 's/ /\t/g' | cut -f 10 | sed 's/:/\t/g' | cut -f 5 | sed 's/,/\t/g' | cut -f 1)

AltAlleleFrequencyControl=$(echo $line | sed 's/ /\t/g' | awk -v RCC=$ReadCountControl -v AltAlleleNormal=$SupportAltAlleleControl '{print AltAlleleNormal/RCC}')

IndelFrequnecyControl=$(echo $line | sed 's/ /\t/g' | awk -v RCC=$ReadCountControl -v AltINDELNormal=$SupportIndelControl '{print AltINDELNormal/RCC}')

He believes that

In cases where the frequency is above 100% or 1, this is likely an error where there is more information in support of the variant than there is read depth???. In these cases, you could consider the frequency to be around 1.

ADD REPLY
0
Entering edit mode

Sorry @ATpoint, I googled but I failed to find a full documentation explaining what each part of a vcf from Strelka means especially INFO column. Please can you share if you found such documentation?

ADD REPLY

Login before adding your answer.

Traffic: 1691 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6