2
1
Entering edit mode
6.7 years ago
Augusta ▴ 30

Hi,

I would like to perform a linkage analysis on whole exome sequencing data. I've spent the last few days trying to figure out how do this and found many threads and papers. However, I don't know much about informatics and barely understand a word. I know that it might be annoying to explain something that complex to a complete layman but I would be grateful beyond words. I don't have anybody to ask and really need to accomplish this.

My thoughts about the basics: 1. I need files with information about the SNPs in the different family members (I have VCF-files) 2. I need to somehow choose markers out of these SNPs or find a program that does it for me (how are these markers chosen/ what makes a good marker? And what is the format of the output file? ped?) 3. The information about the markers (their location and allele frequencies in general and in the different family members) has to be fused in some kind of file, which must also contain information about the relationship of the family members (How do I get such a file, what is the file format? What is the next step?)

Is that correct? If so, how do I put these steps into actions?

linkage analysis next generation sequencing • 2.8k views
0
Entering edit mode
6.7 years ago
reza.jabal ▴ 570

Hi, I guess if you convert your vcf files to plink format you can easily carry out your analysis just by following plink manual! You will probably find this tutorial useful: https://www.staff.ncl.ac.uk/heather.cordell/mres2012.html

Hope it helps!

0
Entering edit mode
6.7 years ago
Augusta ▴ 30

Hi reza.jabal,

unfortunately it doesn't really help.

Again, the explanations start with a list of files needed, and I don't know how to get them from my VCF-file, and what they are for. I've been reading introductions like this a lot recently, but they mostly confuse me. Could you tell me whether my thoughts in the list above are basically right to begin with? Also, what does converting a VCF-file into a plink-file do, does it add some kind of information? If so, what and where is it taken from?

I am sorry for being so clueless. I am afraid if I just follow instructions without understanding the single steps, I won't produce useful results. :-(

1
Entering edit mode

I sse! For any linkage analysis you initially need to identify your tag SNPs across your target region. Tag SNPs are simply the most informative markers with high linkage disequilibrium that describe haplotypes across your target region. There are several softwares like Haploview, Tagger and NCBI tagSNP.

The first step is [preferably] to convert your vcf files to plink (ped & map) files. If you have multiple vcf files you can merge all vcf files into a single vcf file using vcftools or GATK. Next you should convert vcf file to plink (ped/map) files, probabely by running this command:

plink --vcf myvcf.vcf  --recode --out myplink


PED is a tab delimited file containing the following information: 1)Family ID; 2)Individual ID; 3)Paternal ID; 4)Maternal ID; 5)Gender & 6)Phenotype and your MAP file contains the information related to your markers (i.e. chromosome (1-22, X, Y or 0 if unplaced) 2)rs# or snp identifier; 3)Genetic distance (morgans) 4)Base-pair position (bp units))

Once you prepared these two files you can easily through them to any software that helps you identify tag snps across your desired region (i.e chromosome 22)! and the rest is pretty much in the tutorial I sent you above!

P.S: When you managed to run the whole process it worth sharing your steps in here to help others with the the same problem.

1
Entering edit mode

Thank you so much! This is incredibly helpful! I will try all that and let you know if I succeeded (and share my steps of course :-)).