Hello, I am new to studying haplotype associations and am looking for guidance on performing LD analysis on INDELs.
I want to conduct SNP association analysis, but I am having difficulty to do the Linkage Disequilibrium analysis (and I want a plot like Haploview).
I have unphased genotype data in Excel, and one of the SNPs is an INDEL. When I attempt to use PLINK or Haploview, I encounter issues because the PED file format does not support INDELs. If I transform from DI to 1 and 2, for example, it understands that 1= A and 2=C.
Regarding the data, three conditions were studied (CC, HSIL and LSIL).
Could you recommend an alternative approach or tool to perform LD analysis on this data?
Is there any way you can phase the data? And do you have access to the original VCF files with the data? All the tools I can think of start with a VCF format file.
Regardless, you could treat the INDELs as biallelic markers (i.e., presence/absence) and trick the tools by pretending they are SNPs. You would lose any locus with multiple INDELs of varying lengths in doing so.
There are all sorts of interpretation implications since SNPs and SVs evolve differently, and you're using unphased data, but it's a start.
Thank you for your reply, unfortunately the data was genotyped by PCR, so I don't have a VCF file. I saw some articles using the online tool SNPAnalizer to analyze with the same indel as me and I believe it considers INDELs as biallelic markers to analyze, but I'm still unsure if the output is actually correct.
@Ant, you can can convert your data into a VCF file by using VCF-Simplify https://github.com/everestial/VCF-Simplify
dthorbur, I have the same issue. Fortunately my data is in a VCF file. Can you please guide me by outlining the steps how to carryout the analysis involving an indel in haploview?