How to extract both start and end position in vcf files
2
0
Entering edit mode
4.9 years ago
ShirleyDai ▴ 40

Hello, I have some vcf files generated from GATK mutect2. I can use GATK VariantsToTable to extract start position of each variants. I wonder if there is an easy way to extract both start and end position in my vcf files. Thanks

next-gen vcf • 2.7k views
ADD COMMENT
1
Entering edit mode
4.9 years ago
venu 6.8k

You mean start and end position of variants? If yes, following will work

(Updated)

vcf-annotate --fill-type Sample1.vcf | grep '^chr' | awk '{if($8 ~ /snp/)print $1"\t"$2"\t"$2"\t"$4"\t"$5; else if($8 ~ /del/)print $1"\t"$2"\t"$2"\t"$4"\t""-"; else if($8 ~ /ins/)print $1"\t"$2"\t"$2+(length($5))"\t"$4"\t"$5}' > Result.txt
ADD COMMENT
0
Entering edit mode

No. I need to extract SNPs and Indels (some has >10 bases) as the following format:

Single nucleotide variants

chr4 150 150 A T

Insertions

Use ‘-’ in the reference_allele field and start/end coordinates must indicate the two adjacent bases in which the insertion occurs between.

chr4 150 151 - T

Deletions

Use ‘-’ in the observed_allele field to denote deletion of the given reference allele.

chr4 150 150 A -

ADD REPLY
0
Entering edit mode

I've updated the answer.

ADD REPLY
0
Entering edit mode

Cool! Many Thanks.

ADD REPLY
0
Entering edit mode
4.9 years ago
MAPK ★ 1.7k

Why not use genomic ranges with custom R script?

ADD COMMENT

Login before adding your answer.

Traffic: 2534 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6