Add lines with N (./.) to VCF file according to a mask BED file
1
0
Entering edit mode
6.6 years ago
CuriousGuy ▴ 90

Hi all,

I need to add lines to an existing VCF, corresponding to positions where I know for certain that information is missing, according to a BED file. That is, I would like to modify said VCF with additional records bearing an "./." genotype in every position that is contained in a BED file. Bcftools consensus has a somewhat similar option called "--mask", but unfortunately it only applies to the generation of FASTA files.

I could of course create a script that goes through the VCF and adds rows in those coordinates, but it would be convenient if there are any tools out there that can do this for me. Or at least if there is a more efficient way to approach this problem.

Thanks in advance.

VCF alignment • 1.4k views
ADD COMMENT
2
Entering edit mode
6.6 years ago

using awk and samtools faidx (to get the reference)

 awk '{S=int($2)+1;E=($3);for(i=S;i<=E;++i) printf("%s %d %d\n",$1,i,i);}' input.bed  |\
sort | uniq |\
sort -t ' ' -k1,1 -k2,2n |\
while read -a P
do echo -e -n "${P[0]}\t${P[1]}\t.\t" && samtools faidx ref.fa "${P[0]}:${P[1]}-${P[1]}" | grep -v  '>' | awk 'BEGIN{NSAMPLES=2;} {printf("%s\tN\t.\t.\t.\tGT",$0); for(i=1;i<=NSAMPLES;i++) printf("\t./."); printf("\n");}' 
done >> concat.vcf
ADD COMMENT
0
Entering edit mode

Hi, thanks for the prompt response. However, what I need -I will edit the original post to make it clear- is to add these lines to an existing VCF file. Although it is probably not the most efficient option, I can take your solution and merge the resulting file (concat.vcf) with the original file (let's say existing.vcf), after removing any repeated positions from existing.vcf (since I want them to be listed as ./. only).

ADD REPLY
0
Entering edit mode

after removing any repeated positions from

use bedtools to remove thoses positions from the bed file.

ADD REPLY
0
Entering edit mode

But I want those positions to have genotype "./.", which is not the case in the original VCF file. I don't expect this to happen often (or at all), but in case of any overlap, the shared lines have to be N.

ADD REPLY

Login before adding your answer.

Traffic: 2431 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6