Question: How to subset a set of variations from a VCF on specific chromosome and between 2 postions?
0
gravatar for Defne
4 months ago by
Defne0
Defne0 wrote:

Hi,

I'm a very beginner on using bash so my question may seem stupid for some of you. I have a VCF annotated file with a big number of samples. I want to subset a file from this one with all the variations of a gene (located on the chromosome ($1 = chr9) and between the position ($2 = POS) 81583683 and 81689305. I used the awk command after modifications awk '{$1== "chr9" && 81583683 <$2< 81689305}' VCF1 > VCF2 but had always error message.

Can anyone tell me please if the awk command is correct in this case for selection with 2 conditions or I should use another command?

Thank you

bash awk subset vcf • 191 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by Defne0

Thank u for help! I used the command of bcftools after indexing the vcf file. my command line looks like this: bcftools view file1.vcf.gz "chr9:81583683-81689305" -O v file2.vcf. It works but it doesn't return all the variations that i want to get, just some of them while I want to get all the variations even the duplicated one.

ADD REPLYlink written 4 months ago by Defne0
1

while I want to get all the variations even the duplicated one

show us the variants ignored by the command above

ADD REPLYlink written 4 months ago by Pierre Lindenbaum123k

Its huge number of variations ignored (I have file with 800 samples and i want to search the variations for all the samples in this region). The command generates only some of variation and just once ( for exemple, if a variation appears in 5 samples, i want to find 5 lines with this variation in the generated file, however with this line command, either I don't find it in the generated file or i find it just one time (on line))

ADD REPLYlink written 4 months ago by Defne0
1

that's still not clear to me

ADD REPLYlink written 4 months ago by Pierre Lindenbaum123k
2
gravatar for Pierre Lindenbaum
4 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum123k wrote:

you want:

 awk -F '\t' '($0 ~ /^#/ || ("chr9" && 81583683 <$2 && $2< 81689305))' VCF1 > VCF2

or, better, after indexing the VCF1:

bcftools view vcf1.vcf.gz "chr9:81583683-81689305"
ADD COMMENTlink modified 4 months ago • written 4 months ago by Pierre Lindenbaum123k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1988 users visited in the last hour