Question

How I can imputed low density (5k) to 50K by beagle software?

0

Entering edit mode

4.1 years ago

ssadegi42 ▴ 20

I want use Beagle to impute my data; How I can imputed low density (5k) to 50K by beagle software?

genome ChIP-Seq • 2.5k views

ADD COMMENT • link 4.1 years ago by ssadegi42 ▴ 20

0

Entering edit mode

Please confirm that you have first tried to do this on your own (?). There is no way to provide a quick / simple answer for you. Have you looked at the Beagle manual, for example, and other online tutorials?

ADD REPLY • link 4.1 years ago by Kevin Blighe 87k

0

Entering edit mode

I create vcf file by plink for reference (50k) file and test (5k) file by bfile; ./plink --file ref --cow --recode vcf --out ref ./plink --file test --cow --recode vcf --out test and i want run a code to impute missing genotype with MY reference panel (50k) java -Xmx20g -jar beagle5.jar gt=test.vcf ref=ref.vcf out=out_myfile

but ref.vcf file not found???????????????

ADD REPLY • link 4.1 years ago by ssadegi42 ▴ 20

0

Entering edit mode

I haven't used any of these tools, but if ref.vcf is not found then your first command is not working.

Which version of plink are you using?

https://www.cog-genomics.org/plink2/input

https://www.cog-genomics.org/plink/2.0/output

I think you can also use --recode-beagle instead of --recode to convert your file to something recognizable by beagle:

.beagle.dat, .chr-.dat, .chr-.map (BEAGLE unphased genotype and variant information files) Produced by "--recode beagle[-nomap]", for use by BEAGLE. In 'beagle' mode, one file pair is generated per autosome, while in 'beagle-nomap' mode, a single .beagle.dat file is generated containing all autosomes. This format cannot be loaded by PLINK.

ADD REPLY • link 4.1 years ago by Fatima ▴ 1000

0

Entering edit mode

hello dear Fatima

I use beagle 5 version and this version uses --recode vcf to run the input file of beagle software, recode-beagle use to beagle 4 version. i run .map and .ped files by plink and achieve the output file of VCF for ref and test data. vcf file format is:

##fileformat=VCFv4.3
##fileDate=20200315
##source=PLINKv2.00
##chrSet=<autosomePairCt=29,X,Y,M>
##contig=<ID=1,length=158229218>
##INFO=<ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  ANG_ANG_001 ANG_ANG_002
1   135098  Hapmap43437-BTA-101873  G   A   .   .   PR  GT  0/0 0/1

ADD REPLY • link updated 4.1 years ago by Kevin Blighe 87k • written 4.1 years ago by ssadegi42 ▴ 20

0

Entering edit mode

Are beagle and ref.vcf in the same folder? If not, are you using the correct path to the vcf file?

gt=[file] specifies a VCF file containing genotypes for the study samples. Each VCF record must contain a GT (genotype) format field. If any heterozygote genotype is unphased (with ‘/’ allele separator) in a marker window, Beagle 5.1 will consider all heterozygote genotypes to be unphased, regardless of the allele separator used (‘|’ or ‘/’).

ref=[file] specifies a reference panel in bref3 or VCF format. Each genotype must have two phased, non-missing alleles. If a VCF file is specified, the phased allele separator must be used ‘|’.

https://faculty.washington.edu/browning/beagle/beagle_5.1_08Nov19.pdf https://faculty.washington.edu/browning/beagle/run.beagle.21Mar20.d65.example

ADD REPLY • link 4.1 years ago by Fatima ▴ 1000

0

Entering edit mode

I run beagle software but I encounter this error. please help me to solve

Command line:

 java -Xmx3641m -jar beagle.25Nov19.28d.jar
  gt=test1.vcf
  ref=ref1.vcf
  out=Imputation

No genetic map is specified: using 1 cM = 1 Mb
java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: ERROR: unphased or missing genotype for reference sample ANG_ANG_000001 at marker [1    47524118        BTB-01572184    G       A]
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

ADD REPLY • link updated 4.1 years ago by GenoMax 141k • written 4.1 years ago by ssadegi42 ▴ 20

0

Entering edit mode

I think that the problem is that your reference file must have phased genotypes.

ref=[file] specifies a reference panel in bref3 or VCF format. Each genotype must have two phased, non-missing alleles. If a VCF file is specified, the phased allele separator must be used ‘|’.

ADD REPLY • link 4.1 years ago by Kevin Blighe 87k

0

Entering edit mode

Yes, as @kevin mentioned the error is related to phase.

(I deleted my previous comment about path to the file since it sounded irrelevant now that you're getting another error rather than ref.vcf not found. )

You can also google these error to find how to solve them.

No genetic map is specified: using 1 cM = 1 Mb

ERROR: unphased or missing genotype for reference sample

ADD REPLY • link 4.1 years ago by Fatima ▴ 1000

0

Entering edit mode

thanks a lot, dear Kevin for your guide; when I use the Plink to create VCF file for "ref" data, the phased allele separator was ‘/’

since I'm a beginner

I use R software and replace ‘|’ instead of ‘/’ in Ref.VCF and Beagle was executed. But, I don't know that this way is true or false?

also this comment "No genetic map is specified: using 1 cM = 1 Mb" still observed; and I don't know whether this comment is a problem or not?

also I convert the imputed Beagle output (by this way) by plink to create Ped and Map file, but the most of imputed genotypes are hemozygote.

please help me to create ref.VCF using ‘|’ separator.

ADD REPLY • link 4.1 years ago by ssadegi42 ▴ 20

0

Entering edit mode

I use R software and replace ‘|’ instead of ‘/’ in Ref.VCF and Beagle was executed. But, I don't know that this way is true or false?

That is bad practice, and immediately puts error in your data. Can you please take a look at the difference between phased and un-phased genotypes? For Beagle, you will require a phased reference dataset.

The 1000 Genomes Phase III data is phased, by the way. You can download it here: Produce PCA bi-plot for 1000 Genomes Phase III - Version 2

ADD REPLY • link 4.1 years ago by Kevin Blighe 87k

0

Entering edit mode

Dr. Kevin thanks for your help and i hope to know you meaning. I studied phased and un-phased for target and REF dataset in beagle manual. I want use of my Ref file (50k) not 1000genome project as the reference dataset. so i want convert the un-phased my Ref.Vcf to phased Ref.vcf to impute target file. can you help me and introduce a software to convert un-phased to phased file? I hope that I have been able to convey the concept

ADD REPLY • link 4.1 years ago by ssadegi42 ▴ 20

1

Entering edit mode

I don't know if using tools like these ones can help or puts error in your data:

https://github.com/broadinstitute/gatk-docs/blob/master/gatk3-methods-and-algorithms/Purpose_and_operation_of_Read-backed_Phasing.md

https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html

http://manpages.ubuntu.com/manpages/bionic/man1/vcftools.1.html

ADD REPLY • link 4.1 years ago by Fatima ▴ 1000

0

Entering edit mode

Dear Fatima;

Thank you for taking the time to help me, I will try to use these softwares.

ADD REPLY • link 4.1 years ago by ssadegi42 ▴ 20

score 2 · Answer 1 · 2020-03-25

2

Entering edit mode

4.1 years ago

ssadegi42 ▴ 20

thanks a lot @ dr. Fatima and Dr.Kevin I use GATK software to convert un-phased.vcf to phased.vcf for ref input

also I could find a simple way to convert unphased to phased using Beagle

thanks

ADD COMMENT • link 4.1 years ago by ssadegi42 ▴ 20