Question: SNPs data in genomic sequences
0
gravatar for tutoring891
5.9 years ago by
United States
tutoring8910 wrote:

Hello I have a set of 10 genomic sequences each corresponding to a sample. My final goal is to do Genome wide association analysis between a phenotype that I have for all the samples and the genotype (which is going to be obtained from the SNP data). I have the phenotypic trait value for each sample however I need also the genotypic data (SNPs) for those samples. In other words I would need a SNP matrix of size (10 * number of loci or sites). I'm not familiar at all in identifying or analyzing SNPs from those genomic sequences therefore I would really appreciate it if you could provide me assistance in terms of how would I convert a set of 10 genomic sequences into this genotypic matrix? I would appreciate it if you know a good R package which could assist me in doing this?

phylogenetics • 1.7k views
ADD COMMENTlink modified 5.9 years ago by Chrispin Chaguza260 • written 5.9 years ago by tutoring8910
2
gravatar for smilefreak
5.9 years ago by
smilefreak420
New Zealand
smilefreak420 wrote:

If you put your sequences into individual files in the fastq format

http://en.wikipedia.org/wiki/FASTQ_format

Followed by using the bwa mem command to align these fastq files to a reference or a consensus sequence (this could just be one of your sequences), this file will need to be in FASTA format.

Samtools could then be used to call variants for each of your samples,then vcftools to convert to the plink format, and finally to perform the phenotypic association the tool PLINK.

http://bio-bwa.sourceforge.net/bwa.shtml (BWA)

http://vcftools.sourceforge.net/ (Vcftools)

http://www.htslib.org/ (Samtools)

http://pngu.mgh.harvard.edu/~purcell/plink/plink2.shtml (Plink)

ADD COMMENTlink modified 11 months ago by RamRS30k • written 5.9 years ago by smilefreak420
0
gravatar for Chrispin Chaguza
5.9 years ago by
Wellcome Sanger Institute
Chrispin Chaguza260 wrote:

I would advise that you should try this tool developed by folks at Sanger Institute. It's used to identify snp sites from a multi fasta alignment. https://github.com/sanger-pathogens/snp_sites

ADD COMMENTlink written 5.9 years ago by Chrispin Chaguza260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1382 users visited in the last hour