Question: How to subset a set of variations from a VCF on specific chromosome and between 2 postions?
0
gravatar for Defne
6 days ago by
Defne0
Defne0 wrote:

Hi,

I'm a very beginner on using bash so my question may seem stupid for some of you. I have a VCF annotated file with a big number of samples. I want to subset a file from this one with all the variations of a gene (located on the chromosome ($1 = chr9) and between the position ($2 = POS) 81583683 and 81689305. I used the awk command after modifications awk '{$1== "chr9" && 81583683 <$2< 81689305}' VCF1 > VCF2 but had always error message.

Can anyone tell me please if the awk command is correct in this case for selection with 2 conditions or I should use another command?

Thank you

bash awk subset vcf • 69 views
ADD COMMENTlink modified 6 days ago by finswimmer11k • written 6 days ago by Defne0
1
gravatar for Pierre Lindenbaum
6 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:

you want:

 awk -F '\t' '($0 ~ /^#/ || ("chr9" && 81583683 <$2 && $2< 81689305))' VCF1 > VCF2

or, better, after indexing the VCF1:

bcftools view vcf1.vcf.gz "chr9:81583683-81689305"
ADD COMMENTlink modified 6 days ago • written 6 days ago by Pierre Lindenbaum120k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1802 users visited in the last hour