Hello!
I have used Strelka as variant caller for SNV. I'm having trouble with the output... I would like to obtained the read counts of Tumor/Normal's REF and ALT. I have already read this post Strelka Indel Allele Counts and I have found it useful for the INDEL... Unfortunately, there isn't a suggestion for SNVs.
For SNV: this is the header of the vcf file:
#INFO=<ID=QSS,Number=1,Type=Integer,Description="Quality score for any somatic snv, ie. for the ALT allele to be present at a significantly different frequency in the tumor and normal">
##INFO=<ID=TQSS,Number=1,Type=Integer,Description="Data tier used to compute QSS">
##INFO=<ID=NT,Number=1,Type=String,Description="Genotype of the normal in all data tiers, as used to classify somatic variants. One of {ref,het,hom,conflict}.">
##INFO=<ID=QSS_NT,Number=1,Type=Integer,Description="Quality score reflecting the joint probability of a somatic variant and NT">
##INFO=<ID=TQSS_NT,Number=1,Type=Integer,Description="Data tier used to compute QSS_NT">
##INFO=<ID=SGT,Number=1,Type=String,Description="Most likely somatic genotype excluding normal noise states">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Somatic mutation">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth for tier1 (used+filtered)">
##FORMAT=<ID=FDP,Number=1,Type=Integer,Description="Number of basecalls filtered from original read depth for tier1">
##FORMAT=<ID=SDP,Number=1,Type=Integer,Description="Number of reads with deletions spanning this site at tier1">
##FORMAT=<ID=SUBDP,Number=1,Type=Integer,Description="Number of reads below tier1 mapping quality threshold aligned across this site">
##FORMAT=<ID=AU,Number=2,Type=Integer,Description="Number of 'A' alleles used in tiers 1,2">
##FORMAT=<ID=CU,Number=2,Type=Integer,Description="Number of 'C' alleles used in tiers 1,2">
##FORMAT=<ID=GU,Number=2,Type=Integer,Description="Number of 'G' alleles used in tiers 1,2">
##FORMAT=<ID=TU,Number=2,Type=Integer,Description="Number of 'T' alleles used in tiers 1,2">
##FILTER=<ID=DP,Description="Greater than 3.0x chromosomal mean depth in Normal sample">
##FILTER=<ID=BCNoise,Description="Fraction of basecalls filtered at this site in either sample is at or above 0.4">
##FILTER=<ID=SpanDel,Description="Fraction of reads crossing site with spanning deletions in either sample exceeeds 0.75">
##FILTER=<ID=QSS_ref,Description="Normal sample is not homozygous ref or ssnv Q-score < 15, ie calls with NT!=ref or QSS_NT < 15">
And these are the first lines of my output:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR
chr1 19199723 . G A . PASS NT=ref;QSS=34;QSS_NT=15;SGT=GG->AA;SOMATIC;TQSS=1;TQSS_NT=1 DP:FDP:SDP:SUBDP:AU:CU:GU:TU 6:0:0:0:0,0:0,0:6,6:0,0 19:0:0:0:19,19:0,0:0,0:0,0
chr1 21738020 . A G . PASS NT=ref;QSS=31;QSS_NT=15;SGT=AA->GG;SOMATIC;TQSS=1;TQSS_NT=1 DP:FDP:SDP:SUBDP:AU:CU:GU:TU 6:0:0:0:6,6:0,0:0,0:0,0 18:0:0:0:0,0:0,0:18,18:0,0
I am able to extract for the Normal case the read counts of the REF and for Tumor the read counts of the ALT - for example in the first SNV:
chr1 19199723 . G A . PASS NT=ref;QSS=34;QSS_NT=15;SGT=GG->AA;SOMATIC;TQSS=1;TQSS_NT=1 DP:FDP:SDP:SUBDP:AU:CU:GU:TU 6:0:0:0:0,0:0,0:6,6:0,0 19:0:0:0:19,19:0,0:0,0:0,0
I have the number of the allele G for Normal (6) and the number of the alle A for Tumor (19). They didn't report the number of the allele A for Normal and the number of the allele G for Tumor...
Could you help me?
Thank you in advance