Question: Imputation without reference for small number of samples/taking care of missing values
gravatar for shreyasibiswas88
5.2 years ago by
United States
shreyasibiswas8830 wrote:

Hi all,

I am trying to do some association study using SNPs called from RNA-seq experiment. My study system is Rattus norvegicus.

Now I called SNPs from case samples and control samples and then merged the two VCF files using vcf-merge. On merging, I noticed that if there is a SNP in 'case' that is absent in 'control', the output is 00 in .ped file generated from the VCF format. Sometimes, 00 represents missing value and in some cases it represents the reference. As a result I have a LOT of 00 in my .ped file which is messing up my association test.

I understand that one can use --merge with --merge-mode 5 from plink but for my study system I don't have a reference vcf with genotypes. That means, I cannot do imputation using reference panel. I tried imputation using Beagle that does not require reference panel. However, my sample size is extremely small for that to work (4 cases and 4 controls) properly.

Do you have any suggestion what and how I could take care of this imputation problem for my samples? Please help. I appreciate it.

Thank you.



plink snp rna-seq • 2.2k views
ADD COMMENTlink modified 4.8 years ago by Hongxu Dong40 • written 5.2 years ago by shreyasibiswas8830
gravatar for Hongxu Dong
4.8 years ago by
Hongxu Dong40
University of Illinois at Urbana-Champaign
Hongxu Dong40 wrote:


You can try LinkImpute (, a software based on k-nearest neighbors genotype imputation method, which is designed for non-model organisms. 



ADD COMMENTlink written 4.8 years ago by Hongxu Dong40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1546 users visited in the last hour