Hello,
How can I create a standard linkage format file as an input for Haploview or convert my excel sheet to this format or other acceptable input formats?
My recommendation to you is to get your data converted into PLINK format, from where you can perform many standard analyses, and also convert this further to many other formats (including Haploview).
As you have not explained in detail what data you currently have, there's no specific help that can be offered.
Many Thanks Kevin. I am undergoing a case-control study that tried to find the association between two SNPs on the same gene with Ischemic Stroke(IS). And I had the statistical analysis made by a statistician. The results showed the association of only one SNP with IS. The reviewers asked me to check the possibility that the two SNPs constitute a haplotype. As I am new in this field and I have my raw data in .xlsx format, I've searched for programs to perform this and I have already installed Haploview and Tassel. I have already reached the link you sent above and knew the formats requested by haploview but still have a problem in the know-how. On my search journey, up till now I've succeeded only in converting the excel file into (.hmp.text) format and could open this into tassel. I am asking now for your further recommendation
I see, sounds quite interesting. If you can already get the data into Tassel, then all that you need to do is perform Linkage Disequilibrium (LD) - variants with high LD may be part of the same haploblock. It really is as simple as that, in this case, as a reply to a reviewer. You can search publications to find out which cut-off to use for LD, but Tassel probably already has a default value.
For what it is worth, and for others arriving here, I have some code to identify haploblocks, but coming via the VCF/BCF + PLINK + HaploView route, here: How do I compute ld blocks from the hapmap ld_data?
Hello, Kevin
I've tried the LD analysis in Tassel 5, but I have faced many issues :
1- while entering the Data I was asked how to treat heterozygous calls
the default was "set to missing". Changing from this choice to "ignore (inbred lines only)", gives different results, and I don't know the correct action.
2- I have combined both cases and controls in the same data file. the results gave me values for D',R^2, p-value and N, which I couldn't fully interpret (despite consuming a lot of time reviewing and studying LD and its statistics, I've read that if D(which is expressed as standardized units as D' or r^2)=0,therefore population is in equilibrium, otherwise it is in LD; )
the Results were: R^2=0.003, D'=0.193, pDiseq=0.771, N=92 (for the first "default" choice)
and: R^2=0, D'=0.038, pDiseq=0.689, N=466 (for the second choice)
Firstly, I deduced from D' value which is more than zero that the 2 SNPs constitute haploblock, but I didn't get the information of which combination of alleles constitute this haplotype.
Secondly, I don't know what further information can be deduced from R^2, p-value and N, in fact I don't know what "N" abbreviates
Assuming my interpretation is correct, I imagine that I 'll be asked to figure out which haplotype is associated with the disease. So, I've also tried to put my data, according to Haploview tutorial instructions, in the linkage format (but in the form of non-family based data, that I used a dummy value for the pedigree name (1, 2, 3...) and fill in zeroes for father and mother ID) and converted (.txt) to (.info). then tried to open it in Haploview. but every time it gave me an error message: "0 is his own parent in family 109". I have omitted the whole raw to detect whether the error is related to this specific raw, however, this gave me the same message but with different family no. (pedigree name).
I have not actually used Tassel, and I am not entirely sure about these options for handling heterozygous calls. I guess that it relates to the fact that there can be 3 'states' that produce a different phenotype, i.e., homozygous reference, homozygous variant, or heterozygous variant.
I don't recall HaploView being difficult, so, if you get your data imported there, you may find it easier. To solve the error, you likely just have to manually edit the files. I checked and one of my previous Haploview inputs looks like this:
Thank you so much, kevin. I agree with you concerning Tassel, plus I recalled that it does not support unphased data (not knowing which allele falls on which chromosome) and the third option "treat heterozygous call as third state " is not woking.
Now, I've tried on Haploview the second format (phased haplotype format) in a real marker info file and a sample data file and they do open without errors.
Many Thanks Kevin. I am undergoing a case-control study that tried to find the association between two SNPs on the same gene with Ischemic Stroke(IS). And I had the statistical analysis made by a statistician. The results showed the association of only one SNP with IS. The reviewers asked me to check the possibility that the two SNPs constitute a haplotype. As I am new in this field and I have my raw data in .xlsx format, I've searched for programs to perform this and I have already installed Haploview and Tassel. I have already reached the link you sent above and knew the formats requested by haploview but still have a problem in the know-how. On my search journey, up till now I've succeeded only in converting the excel file into (.hmp.text) format and could open this into tassel. I am asking now for your further recommendation
I see, sounds quite interesting. If you can already get the data into Tassel, then all that you need to do is perform Linkage Disequilibrium (LD) - variants with high LD may be part of the same haploblock. It really is as simple as that, in this case, as a reply to a reviewer. You can search publications to find out which cut-off to use for LD, but Tassel probably already has a default value.
For what it is worth, and for others arriving here, I have some code to identify haploblocks, but coming via the VCF/BCF + PLINK + HaploView route, here: How do I compute ld blocks from the hapmap ld_data?
This seems pretty helpful. Thank you so much.
Hello, Kevin I've tried the LD analysis in Tassel 5, but I have faced many issues : 1- while entering the Data I was asked how to treat heterozygous calls
the default was "set to missing". Changing from this choice to "ignore (inbred lines only)", gives different results, and I don't know the correct action.
2- I have combined both cases and controls in the same data file. the results gave me values for D',R^2, p-value and N, which I couldn't fully interpret (despite consuming a lot of time reviewing and studying LD and its statistics, I've read that if D(which is expressed as standardized units as D' or r^2)=0,therefore population is in equilibrium, otherwise it is in LD; ) the Results were: R^2=0.003, D'=0.193, pDiseq=0.771, N=92 (for the first "default" choice) and: R^2=0, D'=0.038, pDiseq=0.689, N=466 (for the second choice) Firstly, I deduced from D' value which is more than zero that the 2 SNPs constitute haploblock, but I didn't get the information of which combination of alleles constitute this haplotype. Secondly, I don't know what further information can be deduced from R^2, p-value and N, in fact I don't know what "N" abbreviates
Assuming my interpretation is correct, I imagine that I 'll be asked to figure out which haplotype is associated with the disease. So, I've also tried to put my data, according to Haploview tutorial instructions, in the linkage format (but in the form of non-family based data, that I used a dummy value for the pedigree name (1, 2, 3...) and fill in zeroes for father and mother ID) and converted (.txt) to (.info). then tried to open it in Haploview. but every time it gave me an error message: "0 is his own parent in family 109". I have omitted the whole raw to detect whether the error is related to this specific raw, however, this gave me the same message but with different family no. (pedigree name).
I have not actually used Tassel, and I am not entirely sure about these options for handling heterozygous calls. I guess that it relates to the fact that there can be 3 'states' that produce a different phenotype, i.e., homozygous reference, homozygous variant, or heterozygous variant.
I don't recall HaploView being difficult, so, if you get your data imported there, you may find it easier. To solve the error, you likely just have to manually edit the files. I checked and one of my previous Haploview inputs looks like this:
CEU.Haploview.info (tab-delimited)
CEU.Haploview.ped (space-delimited)
The first 6 columns in the PED files relate to phenotype info, but they are all dummy
0
values.Thank you so much, kevin. I agree with you concerning Tassel, plus I recalled that it does not support unphased data (not knowing which allele falls on which chromosome) and the third option "treat heterozygous call as third state " is not woking. Now, I've tried on Haploview the second format (phased haplotype format) in a real marker info file and a sample data file and they do open without errors.