How does plink guess the reference base in "--recode vcf-iid"
1
1
Entering edit mode
9.3 years ago
Dan ▴ 540

Here: http://oryzasnp-atcg-irri-org.s3-website-ap-southeast-1.amazonaws.com/3krg-3k_filt_snp-v1/README-3kRG-filtered-SNP-v1.txt

The recommended SOP for recoding PED / MAP to VCF is:

plink --file <ped file name without extension> --recode vcf-iid --out <output file name without extension>

However, without specifying the reference sequence, how is the reference base in the VCF guessed?

Does plink have an extra option for passing the reference sequence when calling --recode?

I'm reading about the associated file formats here:

and I don't see the reference base stored anywhere.

vcf plink • 3.5k views
ADD COMMENT
1
Entering edit mode
9.3 years ago
Dan ▴ 540

From plink --help | less

  --recode ...

    The A2 allele is saved as the reference and normally flagged as not based
    on a real reference genome ('PR' INFO field value).  When it is important
    for reference alleles to be correct, you'll also want to include
    --a2-allele and --real-ref-alleles in your command.

  --a2-allele [filename] {a2col} {IDcol} {skip} :
    Force alleles in the file to A2.  ("--a2-allele [VCF filename] 4 3 '#'",
    which scrapes reference allele assignments from a VCF file, is especially
    useful.

Sigh

ADD COMMENT

Login before adding your answer.

Traffic: 1380 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6