Hi all,
I’m going to determine the variants of given genes from whole genome sequencing data of a population (human), these variant effects on the phenotype. One expert suggested me: 1) obtain phased vcf file, then haplotype, 2) extract all variants related to those genes from the coding regions, and 3) finally annotate with previously known variants.
I’m aware of phasing vcf file, but I cannot understand why haplotype should be created? And how the variant of interest should be extracted from haplotype and annotated with known variants, which tool can compare haplotype with a list of variants?
Could you please clear me on this issue? Please kindly provide me your valuable comments and suggestions.
Thanks
Be aware that no 'expert' exists anywhere, nor in any particular field. If someone calls themselves an 'expert', then you know that they are not. An expert is created when a community or group of people come together with united power and knowledge.
How you process the data will greatly depend on your end goals. Deriving haplotypes may just have been mentioned because the 'expert' was recently working on such a project involving haplotypes, or s/he always derives haplotypes out of some interest in that particular area.
So, what is the goal of your work?; which wet lab protocol have you used to conduct the sequencing?; have you even conducted phased sequencing, like, with 10X?
OK. No phased sequencing, the whole genome of a population has been sequenced by Illumina Hiseq as 100bp, PE read. Now, I want to determine the variant of given genes in the population. I'm new in this filed, but read a lot and can understand them except for obtaining haplotype for this issue. I was wondering if I can get phased vcf file (not haplotype), then extract the variant of interest and annotate them, is it a right way? I'm really want to know the necessity of getting haplotype here, please kindly advise me.
Okay, so, presumably you have FASTQ files right now, right? You also mention HISEQ, which implies Illumina. If there is no bioinformatics expertise available (?), then Illumina's BaseSpace would be an option (for calling variants).
I still don't quite understand the specifics of your work but I would do the following as a general type of analysis:
Hi Kevin,
Thank you very much for your explanation. Yes, I have fastq files. I'm aware of phased vcf file that alleles separated with pipe sign (|). As I searched, there are various tools that get phased vcf file as input and extract the haplotype as fasta format.
Assuming gene X and its variants, some variants can be found on one chromosome or on two chromosomes. For determining the correct situation of variant distribution on the chromosomes, could you please kindly tell me if the required information should be extracted from phased vcf file or haplotype should be obtained from phased vcf file?
Thanks in advance,
Seta