Hi, I'm using the UKB DNAnexus platform and I want to find if there exist variants (SNPs, CNVs) at specific genes across all individuals.
Can I do such a thing with PLINK? Which other tools are there available?
What is the format of your data ?
If your data is in "classic" NGS data format (VCF, bed, ped, pgen ..), PLINK can do this. and other tools can do this like BCFtools.
For something of UK Biobank wgs size you will have to use a variant warehouse of some sort to enable a reasonable turnaround time. PLINK and BCFtools are not designed to handle a 500k x 585M matrix.
There are several choices of variant warehouses:
Is there a file format better suited for the era of pangenomics than the .vcf? What are its attributes?
Open source TileDB-VCF enables slicing by chr/pos + sample and can accommodate those type of interactive queries. To do more complex scalable analysis you should consider TileDB-Cloud. Feel free to PM me if you are interested in hearing more.
Swiss Army Knife in DNA nexus contains bcftools. https://documentation.dnanexus.com/user/running-apps-and-workflows/tools-list
create a file of paths containing a subset of VCF files overlapping your gene.
bcftools concat -a --regions "gene-chr:genestart-geneend' --file-list vcf.list -O b -o beware.big.bcf
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy