Imputation without reference for small number of samples/taking care of missing values
1
0
Entering edit mode
8.6 years ago

Hi all,

I am trying to do some association study using SNPs called from RNA-seq experiment. My study system is Rattus norvegicus.

Now I called SNPs from case samples and control samples and then merged the two VCF files using vcf-merge. On merging, I noticed that if there is a SNP in 'case' that is absent in 'control', the output is 00 in .ped file generated from the VCF format. Sometimes, 00 represents missing value and in some cases it represents the reference. As a result I have a LOT of 00 in my .ped file which is messing up my association test.

I understand that one can use --merge with --merge-mode 5 from plink but for my study system I don't have a reference vcf with genotypes. That means, I cannot do imputation using reference panel. I tried imputation using Beagle that does not require reference panel. However, my sample size is extremely small for that to work (4 cases and 4 controls) properly.

Do you have any suggestion what and how I could take care of this imputation problem for my samples? Please help. I appreciate it.

Thank you

RNA-Seq Plink SNP • 2.9k views
ADD COMMENT
0
Entering edit mode
8.2 years ago
Hongxu Dong ▴ 40

Hi,

You can try LinkImpute (http://www.g3journal.org/content/early/2015/09/15/g3.115.021667), a software based on k-nearest neighbors genotype imputation method, which is designed for non-model organisms.

Hongxu

ADD COMMENT

Login before adding your answer.

Traffic: 2934 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6