I am very new to the data analysis of the NGS and struggling with my project. I need help to sort my dataset and the workflow for my analysis. Below mentioned are my inputs on what I plan to do. I'm unsure if my understanding is right. Any kind of feedback and suggestions would help.
I'm trying to figure out the parent of origin effect on the global gene expression. For this, I have to look at the trio dataset (child/mother/father). After checking for the datasets and to begin with the analysis I have downloaded the separate VCF files of child(
NA19257), & father (
NA19256) respectively from 1000 genomes project. This dataset is unphased.
- As next step, I intend to use Beagle for phasing of the files and then using vcftools convert the vcf files to plink format to obtain the ped and map files.
- Then using PREMIM and EMIM of the ped and the map files obtain the parent of origin info.
My intention is to map the child genome to mother's and dad's respectively to identify the contribution of each parent.
- Please confirm if I have the right approach?
- Can anyone please suggest other trio datasets that can be used for this analysis?
- If my understanding is incorrect are there any other approaches that I can look up into for this analysis and help get me results faster?
Eagerly in need of help. Feedback and suggestions are highly appreciated.