I would like to perform a linkage analysis on whole exome sequencing data. I've spent the last few days trying to figure out how do this and found many threads and papers. However, I don't know much about informatics and barely understand a word. I know that it might be annoying to explain something that complex to a complete layman but I would be grateful beyond words. I don't have anybody to ask and really need to accomplish this.
My thoughts about the basics: 1. I need files with information about the SNPs in the different family members (I have VCF-files) 2. I need to somehow choose markers out of these SNPs or find a program that does it for me (how are these markers chosen/ what makes a good marker? And what is the format of the output file? ped?) 3. The information about the markers (their location and allele frequencies in general and in the different family members) has to be fused in some kind of file, which must also contain information about the relationship of the family members (How do I get such a file, what is the file format? What is the next step?)
Is that correct? If so, how do I put these steps into actions?
Thanks a lot in advance!
I sse! For any linkage analysis you initially need to identify your tag SNPs across your target region. Tag SNPs are simply the most informative markers with high linkage disequilibrium that describe haplotypes across your target region. There are several softwares like Haploview, Tagger and NCBI tagSNP.
The first step is [preferably] to convert your vcf files to plink (ped & map) files. If you have multiple vcf files you can merge all vcf files into a single vcf file using vcftools or GATK. Next you should convert vcf file to plink (ped/map) files, probabely by running this command:
PED is a tab delimited file containing the following information: 1)Family ID; 2)Individual ID; 3)Paternal ID; 4)Maternal ID; 5)Gender & 6)Phenotype and your MAP file contains the information related to your markers (i.e. chromosome (1-22, X, Y or 0 if unplaced) 2)rs# or snp identifier; 3)Genetic distance (morgans) 4)Base-pair position (bp units))
Once you prepared these two files you can easily through them to any software that helps you identify tag snps across your desired region (i.e chromosome 22)! and the rest is pretty much in the tutorial I sent you above!
P.S: When you managed to run the whole process it worth sharing your steps in here to help others with the the same problem.
Thank you so much! This is incredibly helpful! I will try all that and let you know if I succeeded (and share my steps of course :-)).