How to find variants in a specific gene in a cohort of 500k+ individuals?
2
1
Entering edit mode
14 months ago

Hi, I'm using the UKB DNAnexus platform and I want to find if there exist variants (SNPs, CNVs) at specific genes across all individuals.

Can I do such a thing with PLINK? Which other tools are there available?

variants plink SNPs CNV • 881 views
ADD COMMENT
1
Entering edit mode

Hi,

What is the format of your data ? If your data is in "classic" NGS data format (VCF, bed, ped, pgen ..), PLINK can do this. and other tools can do this like BCFtools.

ADD REPLY
1
Entering edit mode
14 months ago

For something of UK Biobank wgs size you will have to use a variant warehouse of some sort to enable a reasonable turnaround time. PLINK and BCFtools are not designed to handle a 500k x 585M matrix.

There are several choices of variant warehouses: Is there a file format better suited for the era of pangenomics than the .vcf? What are its attributes?

Open source TileDB-VCF enables slicing by chr/pos + sample and can accommodate those type of interactive queries. To do more complex scalable analysis you should consider TileDB-Cloud. Feel free to PM me if you are interested in hearing more.

ADD COMMENT
0
Entering edit mode
14 months ago

Swiss Army Knife in DNA nexus contains bcftools. https://documentation.dnanexus.com/user/running-apps-and-workflows/tools-list

create a file of paths containing a subset of VCF files overlapping your gene.

bcftools concat -a --regions "gene-chr:genestart-geneend' --file-list vcf.list -O b -o beware.big.bcf
ADD COMMENT

Login before adding your answer.

Traffic: 820 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6