vcf file chr notation
0
0
Entering edit mode
2.2 years ago
Hriday • 0

"I have a single VCF file named 'ALL.wgs.shapeit2_integrated_snvindels_v2a.GRCh38.27022019.sites.vcf.gz'. The issue at hand is that the file uses different chromosomal notation and lacks the 'chr' prefix.

Like this "##fileformat=VCFv4.3
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=11032019_15h52m43s
##source=IGSRpipeline
##reference=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
##contig=<ID=1>
##contig=<ID=2>
##contig=<ID=3>
##contig=<ID=4>
##contig=<ID=5>
##contig=<ID=6>
##contig=<ID=7>
##contig=<ID=8>
##contig=<ID=9>
##contig=<ID=10>
##contig=<ID=11>
##contig=<ID=12>
##contig=<ID=13>
##contig=<ID=14>
##contig=<ID=15>
##contig=<ID=16>
##contig=<ID=17>
##contig=<ID=18>
##contig=<ID=19>
##contig=<ID=20>
##contig=<ID=21>
##contig=<ID=22>
##contig=<ID=X>
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated allele frequency in the range (0,1)">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes">
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=EAS_AF,Number=A,Type=Float,Description="Allele frequency in the EAS populations calculated from AC and AN, in the range (0,1)">
##INFO=<ID=EUR_AF,Number=A,Type=Float,Description="Allele frequency in the EUR populations calculated from AC and AN, in the range (0,1)">
##INFO=<ID=AFR_AF,Number=A,Type=Float,Description="Allele frequency in the AFR populations calculated from AC and AN, in the range (0,1)">
##INFO=<ID=AMR_AF,Number=A,Type=Float,Description="Allele frequency in the AMR populations calculated from AC and AN, in the range (0,1)">
##INFO=<ID=SAS_AF,Number=A,Type=Float,Description="Allele frequency in the SAS populations calculated from AC and AN, in the range (0,1)">
##INFO=<ID=VT,Number=.,Type=String,Description="indicates what type of variant the line represents">
##INFO=<ID=EX_TARGET,Number=0,Type=Flag,Description="indicates whether a variant is within the exon pull down target boundaries">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1   10416   .   CCCTAA  C   .   PASS    AC=240;AN=5096;DP=365460;AF=0.05;EAS_AF=0.06;EUR_AF=0.07;AFR_AF=0.01;AMR_AF=0.06;SAS_AF=0.05;VT=INDEL;NS=2548
1   16103   .   T   G   .   PASS    AC=118;AN=5096;DP=29994;AF=0.02;EAS_AF=0;EUR_AF=0.04;AFR_AF=0.03;AMR_AF=0.03;SAS_AF=0.01;VT=SNP;NS=2548
1   17496   .   AC  A   .   PASS    AC=25;AN=5096;DP=189765;AF=0;EAS_AF=0;EUR_AF=0;AFR_AF=0.02;AMR_AF=0;SAS_AF=0;VT=INDEL;NS=2548
1   51479   .   T   A   .   PASS    AC=531;AN=5096;DP=17461;AF=0.1;EAS_AF=0;EUR_AF=0.19;AFR_AF=0.02;AMR_AF=0.11;SAS_AF=0.23;VT=SNP;NS=2548
1   51898   .   C   A   .   PASS    AC=426;AN=5096;DP=15331;AF=0.08;EAS_AF=0.05;EUR_AF=0.14;AFR_AF=0.06;AMR_AF=0.06;SAS_AF=0.11;VT=SNP;NS=2548

Could you please provide some quick awk/sed commands that could address this issue? Additionally, I would appreciate it if you could offer your insight on which of the two tools, GATK or VCFtools, is more dependable for accomplishing this task. Thank you."

SEQ NGS VCF • 994 views
ADD COMMENT
2
Entering edit mode
ADD REPLY
0
Entering edit mode

which link ?

ADD REPLY

Login before adding your answer.

Traffic: 3202 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6