Hi, I'm using the UKB DNAnexus platform and I want to find if there exist variants (SNPs, CNVs) at specific genes across all individuals.
Can I do such a thing with PLINK? Which other tools are there available?
Hi, I'm using the UKB DNAnexus platform and I want to find if there exist variants (SNPs, CNVs) at specific genes across all individuals.
Can I do such a thing with PLINK? Which other tools are there available?
For something of UK Biobank wgs size you will have to use a variant warehouse of some sort to enable a reasonable turnaround time. PLINK and BCFtools are not designed to handle a 500k x 585M matrix.
There are several choices of variant warehouses: Is there a file format better suited for the era of pangenomics than the .vcf? What are its attributes?
Open source TileDB-VCF enables slicing by chr/pos + sample and can accommodate those type of interactive queries. To do more complex scalable analysis you should consider TileDB-Cloud. Feel free to PM me if you are interested in hearing more.
Swiss Army Knife in DNA nexus contains bcftools. https://documentation.dnanexus.com/user/running-apps-and-workflows/tools-list
create a file of paths containing a subset of VCF files overlapping your gene.
bcftools concat -a --regions "gene-chr:genestart-geneend' --file-list vcf.list -O b -o beware.big.bcf
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi,
What is the format of your data ? If your data is in "classic" NGS data format (VCF, bed, ped, pgen ..), PLINK can do this. and other tools can do this like BCFtools.