Entering edit mode
                    6.8 years ago
        MatthewP
        
    
        ★
    
    1.4k
    Hello, everyone! I want to use RNAEditor . This need to prepare many database first, one of them is VCF for dbSNP. This is the command given by document.
wget -qO- ftp://ftp.ensembl.org/pub/release-83/variation/vcf/homo_sapiens/Homo_sapiens.vcf.gz |gunzip -c |awk 'BEGIN{FS="\t";OFS="\t"};match($5,/\./){gsub(/\./,"N",$5)};$5 == "" && $1 !~ /^#/ {gsub("","N",$5)};$3 ~ /rs193922900/ {$5="TN"};$3 ~ /rs59736472/ {$5="AN"};$5 ~ /H/ {gsub(/H/,"N",$5)};{print $0}' dbSNP.vcf
My question is why need to set ALT to TN and AN for rs193922900 and rs59736472 separately? Why this two sites seem to be special for RNAEditor ? 
 Thanks!
I have no idea what
TNandANis meaning. But I have a guess why this SNPs are treated seperatly. They describe short tandem repeats (see rs193922900, rs59736472). The description in of the variant is not vcf conform (see the values on the dbSNP site in theRefSNP Allelescolumns).According to the help site these type of variants is excluded in the current vcf version of dbSNP. But it might be, that in this old version RNEditor linked to, STRs are included and lead to any problems.
Thanks finswimmer! I actually download ther newest version(release-95) of dbSNP file. I go check the
vcffile and find this two sites still invcffile.This may explain why set to
TNandANbecause set all.toN.