How to subset a set of variations from a VCF on specific chromosome and between 2 postions?
1
0
Entering edit mode
22 months ago
NT • 0

Hi,

I'm a very beginner on using bash so my question may seem stupid for some of you. I have a VCF annotated file with a big number of samples. I want to subset a file from this one with all the variations of a gene (located on the chromosome ($1 = chr9) and between the position ($2 = POS) 81583683 and 81689305. I used the awk command after modifications awk '{$1== "chr9" && 81583683 <$2< 81689305}' VCF1 > VCF2 but had always error message.

Can anyone tell me please if the awk command is correct in this case for selection with 2 conditions or I should use another command?

Thank you

awk bash vcf subset • 399 views
0
Entering edit mode

Thank u for help! I used the command of bcftools after indexing the vcf file. my command line looks like this: bcftools view file1.vcf.gz "chr9:81583683-81689305" -O v file2.vcf. It works but it doesn't return all the variations that i want to get, just some of them while I want to get all the variations even the duplicated one.

1
Entering edit mode

while I want to get all the variations even the duplicated one

show us the variants ignored by the command above

0
Entering edit mode

Its huge number of variations ignored (I have file with 800 samples and i want to search the variations for all the samples in this region). The command generates only some of variation and just once ( for exemple, if a variation appears in 5 samples, i want to find 5 lines with this variation in the generated file, however with this line command, either I don't find it in the generated file or i find it just one time (on line))

1
Entering edit mode

that's still not clear to me

2
Entering edit mode
22 months ago

you want:

 awk -F '\t' '($0 ~ /^#/ || ("chr9" && 81583683 <$2 && \$2< 81689305))' VCF1 > VCF2


or, better, after indexing the VCF1:

bcftools view vcf1.vcf.gz "chr9:81583683-81689305"