Hello I have a set of 10 genomic sequences each corresponding to a sample. My final goal is to do Genome wide association analysis between a phenotype that I have for all the samples and the genotype (which is going to be obtained from the SNP data). I have the phenotypic trait value for each sample however I need also the genotypic data (SNPs) for those samples. In other words I would need a SNP matrix of size (10 * number of loci or sites). I'm not familiar at all in identifying or analyzing SNPs from those genomic sequences therefore I would really appreciate it if you could provide me assistance in terms of how would I convert a set of 10 genomic sequences into this genotypic matrix? I would appreciate it if you know a good R package which could assist me in doing this?
If you put your sequences into individual files in the fastq format
Followed by using the bwa mem command to align these fastq files to a reference or a consensus sequence (this could just be one of your sequences), this file will need to be in FASTA format.
Samtools could then be used to call variants for each of your samples,then vcftools to convert to the plink format, and finally to perform the phenotypic association the tool PLINK.
I would advise that you should try this tool developed by folks at Sanger Institute. It's used to identify snp sites from a multi fasta alignment. https://github.com/sanger-pathogens/snp_sites