Question: Does Imputation Always Need Reference Panel
1
gravatar for kumar.vinod81
6.6 years ago by
kumar.vinod81280
New Delhi
kumar.vinod81280 wrote:

I've done SNP (6000 SNPs) genotyping of 220 plant varieties using illumina platform, but the file contains a lot of missing values and I want to impute to get them filled. But I do not have any reference panel for this purpose which is most commonly required in human SNP imputation. Can I do SNP imputation without reference panel. Can anyone help me in this regards. Thanks

Details

Illumina Infinium custom arrays was used for genotyping of 6000 snp for 220 individuals. SNP data were analyzed using GenomeStudio V2010.1. SNP genotypes were called using genotyping (GT) module integrated in the software where individual SNPs is viewed as GenoPlots. Data quality is rapidly confirmed with internal controls and other QC functions such as GenTrain and GenCall scores. After calling the data automatically, the SNPs were re-scored and manually adjusted in a canonical cluster to get a GenTrain score >0.7. Finally we removed any samples from the analysis that had call rates <0.2, as we suspected these samples may be prone to error at those loci for which they were called. SNPs, genotyped for a single type of allele and with low call rate were removed from the dataset. I am talking about theses SNPs which were removed because of low call rate and which were not custered. And theses SNPs in the final ped file is written as missing genotypes, they hybridized but not achieved the critical level of success. So, some software is there to impute theses missing values but in human a reference panel is always required to fill these missing values.

imputation snp • 3.4k views
ADD COMMENTlink modified 6.6 years ago • written 6.6 years ago by kumar.vinod81280

No, you do not need a reference panel if you use mach and beagle among others. Some other programs do need a reference panel, though.

ADD REPLYlink written 6.6 years ago by lh331k
3
gravatar for Josh Herr
6.6 years ago by
Josh Herr5.6k
University of Nebraska
Josh Herr5.6k wrote:

You didn't give us much information to go on. What plant are you investigating? I assume you are using Illumina GoldenGate (based on the numbers you provide)? What do you mean by "missing values"? Did the chip not hybridize? Also, what does "impute" mean in this context? Typically the word means to attribute or accuse, but the context is strange here. Are you attributing your "missing values" to something or trying to "attribute" the position of the SNP to location on a reference?

If you're using an Illumina SNP chip you should already have a well annotated reference genome or reference sequence (unless you created your own chip with Illumina?). Illumina will provide you with a reference file which contains the locations of your SNPs on the chip.

If you have sequencing data you can genotype by SNPs with "high" coverage without a reference. The amount of coverage depends on the size and complexity of the plant genome you are investigating. SNP calling programs or scripts (Genotype And Snp Calling Review Papers) will help you sort errors from actual polymorphisms.

In regards to plants I would be cautious to map SNP data from one species to a closely related species. Even with Arabidopsis thaliana versus A. lyrata you will have substantial gene duplication events to make calling SNPs with any confidence a difficult task. I've tried.

ADD COMMENTlink written 6.6 years ago by Josh Herr5.6k

Illumina Infinium custom arrays was used for genotyping of 6000 snp for 220 individuals. SNP data were analyzed using GenomeStudio V2010.1. SNP genotypes were called using genotyping (GT) module integrated in the software where individual SNPs is viewed as GenoPlots. Data quality is rapidly confirmed with internal controls and other QC functions such as GenTrain and GenCall scores. After calling the data automatically, the SNPs were re-scored and manually adjusted in a canonical cluster to get a GenTrain score >0.7. Finally we removed any samples from the analysis that had call rates <0.2, as we suspected these samples may be prone to error at those loci for which they were called. SNPs, genotyped for a single type of allele and with low call rate were removed from the dataset. I am talking about theses SNPs which were removed because of low call rate and which were not custered. And theses SNPs in the final ped file is written as missing genotypes, they hybridized but not achieved the critical level of success. So, some software is there to impute theses missing values but in human a reference panel is always required to fill these missing values. I think now I am clear.....

ADD REPLYlink written 6.6 years ago by kumar.vinod81280
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1504 users visited in the last hour