Question: Merging variant replicates rather than filtering
gravatar for dylkot
23 months ago by
United States
dylkot0 wrote:

I'm analyzing some Illumina genotype data and noticing that almost 2% of variants on the array are typed more than once. I consider these measurements to be replicate if they are measuring the same chromosome, position, reference allele, and alternate allele as specified in the PLINK BIM file. It seems that the typical thing people do is remove all but the first replicate in the file using the --list-duplicate-vars option in PLINK. Are there any tools that implement something slightly smarter than this? Like, for example, a desired behavior might be to take a consensus vote amongst the calls from the replicates to decide which one to use. Thanks in advance!

vcftools plink gwas • 548 views
ADD COMMENTlink modified 23 months ago by chrchang5237.1k • written 23 months ago by dylkot0
gravatar for chrchang523
23 months ago by
United States
chrchang5237.1k wrote:

You can use plink to merge a fileset with itself.

ADD COMMENTlink written 23 months ago by chrchang5237.1k

Thanks for the tip! Are you referring to the --merge-equal-pos option in PLINK 1.9? If so, do you know how it performs the merge? The documentation is ambiguous stating:

If two variants have the same position, PLINK 1.9's merge commands will always notify you. If you wish to try to merge them, use --merge-equal-pos. (This will fail if any of the same-position variant pairs do not have matching allele names.) Unplaced variants (chromosome code 0) are not considered by --merge-equal-pos.

Note that you are permitted to merge a fileset with itself; doing so with --merge-equal-pos can be worthwhile when working with data containing redundant loci for quality control purposes.

There is no reference to this in PLINK 2 as far as I'm aware

ADD REPLYlink written 23 months ago by dylkot0

Also, as a quick note, the command:

plink --bmerge inputdata --bfile inputdata --merge-equal-pos --out outputdata

fails when (as in my case), variants and the same position are replicates and in other cases, they are multi-allelic variants and so have different ALT alleles.

ADD REPLYlink written 23 months ago by dylkot0

For now, I'd use PLINK 2.0's --set-all-var-ids flag to assign brand new chrom/pos/ref/alt-based IDs to every variant, and then follow up with PLINK 1.9's merge function (since yes, merge is not yet implemented in 2.0).

ADD REPLYlink written 23 months ago by chrchang5237.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1515 users visited in the last hour