Question: How I can imputed low density (5k) to 50K by beagle software?
0
gravatar for ssadegi42
10 months ago by
ssadegi4220
ssadegi4220 wrote:

I want use Beagle to impute my data; How I can imputed low density (5k) to 50K by beagle software?

chip-seq genome • 482 views
ADD COMMENTlink modified 10 months ago • written 10 months ago by ssadegi4220

Please confirm that you have first tried to do this on your own (?). There is no way to provide a quick / simple answer for you. Have you looked at the Beagle manual, for example, and other online tutorials?

ADD REPLYlink written 10 months ago by Kevin Blighe69k

I create vcf file by plink for reference (50k) file and test (5k) file by bfile; ./plink --file ref --cow --recode vcf --out ref ./plink --file test --cow --recode vcf --out test and i want run a code to impute missing genotype with MY reference panel (50k) java -Xmx20g -jar beagle5.jar gt=test.vcf ref=ref.vcf out=out_myfile

but ref.vcf file not found???????????????

ADD REPLYlink modified 10 months ago • written 10 months ago by ssadegi4220

I haven't used any of these tools, but if ref.vcf is not found then your first command is not working.

Which version of plink are you using?

https://www.cog-genomics.org/plink2/input

https://www.cog-genomics.org/plink/2.0/output


I think you can also use --recode-beagle instead of --recode to convert your file to something recognizable by beagle:

.beagle.dat, .chr-.dat, .chr-.map (BEAGLE unphased genotype and variant information files) Produced by "--recode beagle[-nomap]", for use by BEAGLE. In 'beagle' mode, one file pair is generated per autosome, while in 'beagle-nomap' mode, a single .beagle.dat file is generated containing all autosomes. This format cannot be loaded by PLINK.

ADD REPLYlink modified 10 months ago • written 10 months ago by Fatima890

hello dear Fatima

I use beagle 5 version and this version uses --recode vcf to run the input file of beagle software, recode-beagle use to beagle 4 version. i run .map and .ped files by plink and achieve the output file of VCF for ref and test data. vcf file format is:

##fileformat=VCFv4.3
##fileDate=20200315
##source=PLINKv2.00
##chrSet=<autosomePairCt=29,X,Y,M>
##contig=<ID=1,length=158229218>
##INFO=<ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  ANG_ANG_001 ANG_ANG_002
1   135098  Hapmap43437-BTA-101873  G   A   .   .   PR  GT  0/0 0/1
ADD REPLYlink modified 10 months ago by Kevin Blighe69k • written 10 months ago by ssadegi4220

Are beagle and ref.vcf in the same folder? If not, are you using the correct path to the vcf file?

gt=[file] specifies a VCF file containing genotypes for the study samples. Each VCF record must contain a GT (genotype) format field. If any heterozygote genotype is unphased (with ‘/’ allele separator) in a marker window, Beagle 5.1 will consider all heterozygote genotypes to be unphased, regardless of the allele separator used (‘|’ or ‘/’).

ref=[file] specifies a reference panel in bref3 or VCF format. Each genotype must have two phased, non-missing alleles. If a VCF file is specified, the phased allele separator must be used ‘|’.

https://faculty.washington.edu/browning/beagle/beagle_5.1_08Nov19.pdf https://faculty.washington.edu/browning/beagle/run.beagle.21Mar20.d65.example

ADD REPLYlink modified 10 months ago • written 10 months ago by Fatima890

I run beagle software but I encounter this error. please help me to solve

Command line:

 java -Xmx3641m -jar beagle.25Nov19.28d.jar
  gt=test1.vcf
  ref=ref1.vcf
  out=Imputation

No genetic map is specified: using 1 cM = 1 Mb
java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: ERROR: unphased or missing genotype for reference sample ANG_ANG_000001 at marker [1    47524118        BTB-01572184    G       A]
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
ADD REPLYlink modified 10 months ago by GenoMax94k • written 10 months ago by ssadegi4220

I think that the problem is that your reference file must have phased genotypes.

ref=[file] specifies a reference panel in bref3 or VCF format. Each genotype must have two phased, non-missing alleles. If a VCF file is specified, the phased allele separator must be used ‘|’.

ADD REPLYlink written 10 months ago by Kevin Blighe69k

Yes, as @kevin mentioned the error is related to phase.

(I deleted my previous comment about path to the file since it sounded irrelevant now that you're getting another error rather than ref.vcf not found. )

You can also google these error to find how to solve them.

No genetic map is specified: using 1 cM = 1 Mb

ERROR: unphased or missing genotype for reference sample

ADD REPLYlink modified 10 months ago • written 10 months ago by Fatima890

thanks a lot, dear Kevin for your guide; when I use the Plink to create VCF file for "ref" data, the phased allele separator was ‘/’

since I'm a beginner

I use R software and replace ‘|’ instead of ‘/’ in Ref.VCF and Beagle was executed. But, I don't know that this way is true or false?

also this comment "No genetic map is specified: using 1 cM = 1 Mb" still observed; and I don't know whether this comment is a problem or not?

also I convert the imputed Beagle output (by this way) by plink to create Ped and Map file, but the most of imputed genotypes are hemozygote.

please help me to create ref.VCF using ‘|’ separator.

ADD REPLYlink modified 10 months ago • written 10 months ago by ssadegi4220

I use R software and replace ‘|’ instead of ‘/’ in Ref.VCF and Beagle was executed. But, I don't know that this way is true or false?

That is bad practice, and immediately puts error in your data. Can you please take a look at the difference between phased and un-phased genotypes? For Beagle, you will require a phased reference dataset.

The 1000 Genomes Phase III data is phased, by the way. You can download it here: Produce PCA bi-plot for 1000 Genomes Phase III - Version 2

ADD REPLYlink written 10 months ago by Kevin Blighe69k

Dr. Kevin thanks for your help and i hope to know you meaning. I studied phased and un-phased for target and REF dataset in beagle manual. I want use of my Ref file (50k) not 1000genome project as the reference dataset. so i want convert the un-phased my Ref.Vcf to phased Ref.vcf to impute target file. can you help me and introduce a software to convert un-phased to phased file? I hope that I have been able to convey the concept

ADD REPLYlink written 10 months ago by ssadegi4220
1

I don't know if using tools like these ones can help or puts error in your data:

https://github.com/broadinstitute/gatk-docs/blob/master/gatk3-methods-and-algorithms/Purpose_and_operation_of_Read-backed_Phasing.md

https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html

http://manpages.ubuntu.com/manpages/bionic/man1/vcftools.1.html

ADD REPLYlink modified 10 months ago • written 10 months ago by Fatima890

Dear Fatima;

Thank you for taking the time to help me, I will try to use these softwares.

ADD REPLYlink written 10 months ago by ssadegi4220
2
gravatar for ssadegi42
10 months ago by
ssadegi4220
ssadegi4220 wrote:

thanks a lot @ dr. Fatima and Dr.Kevin I use GATK software to convert un-phased.vcf to phased.vcf for ref input

also I could find a simple way to convert unphased to phased using Beagle

thanks

ADD COMMENTlink modified 10 months ago • written 10 months ago by ssadegi4220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1501 users visited in the last hour
_