Question: Subsetting .vcf.gz based on .txt file
0
gravatar for chvbs2000
5 weeks ago by
chvbs20000
chvbs20000 wrote:

I am working on preprocessing data from a list of .vcf.gz to subset all these .gz files according to a list of SNPs. I stored SNP IDs of interest into a text file. And I want to all rows from these .vcf.gz files that have the same SNP IDs from the SNP_ID file:

SNP_ID file:

rs61733845
rs1320571
rs9729550
rs1815606
rs7515488
rs11260562
rs6697886
rs6603785
rs11804831

In python I would imagine to process each line on conditional statement or inner join, yet python may not be an optimal choice since the size all these .vcf.gz files are huge. Is there any way I can subsetting vcf.gz based on a text file with bash command such as awk, sed, or cat? Thanks!

sequencing snp genome vcf gene • 96 views
ADD COMMENTlink modified 5 weeks ago by Yean100 • written 5 weeks ago by chvbs20000

duplicate of : Soft filtering of SNPs in a list ; How to get 1000 Genomes data in bulk? ; etc...

ADD REPLYlink written 5 weeks ago by Pierre Lindenbaum129k
0
gravatar for Yean
5 weeks ago by
Yean100
Bangkok
Yean100 wrote:

What's about plink ?

   plink1.9 --vcf input.vcf.gz --extract snp.snplist --make-bed --out extract_snp
ADD COMMENTlink written 5 weeks ago by Yean100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 947 users visited in the last hour