Entering edit mode
6.0 years ago
Shicheng Guo
★
9.4k
I open a forum to record frequently problem I met in the genetic data analysis. Just come to this field and find lots of interesting things to share.
- Some SNP have multiple genomic positions, maybe in same chromosome or different chromosome. For example, rs10408018 located in chr19: 41543327 and chrX: 67106620. These specific SNPs will bring lots of troubles for plink analysis. (since plink don't allow non-unique rsid for the anlayisis pipeline). Solution: remove any one of the repeated SNPs with
--exclude
. and be careful, missnp result sometimes start with a special.
in the first line, remove it before use--exclude
- vcf format files derived from NGS will have genotypes like
./.
and./0
, it will make plink stop running, remove such genotype from vcf files. Plink is used to deal with genotyping data from microarray, not sequencing data.
For #2, what version of plink are you running? All builds in the last ~3 years support ./. and (with the appropriate —vcf-half-call setting) ./0 .
If I search in dbSNP I find only the one on chr19. I'm not sure where your results are from?
I find it in 1000 genome raw vcf file
Is this post incomplete (by design)? Did you mean to add (or keep adding) new things as you find them (I see a 2. but no content for that point)?
Yes, recently, I am working on genetic variants (SNPs) data. I will record most details as long my working. What I am doing is our own GWAS data, Hapmap3, 1000 Genome and POPRES project data. I believe there will be huge number tricky things come out and need to fix them
Any suggestion on haplotype phasing? How to save time...