SNPs data in genomic sequences
2
0
Entering edit mode
9.5 years ago

Hello I have a set of 10 genomic sequences each corresponding to a sample. My final goal is to do Genome wide association analysis between a phenotype that I have for all the samples and the genotype (which is going to be obtained from the SNP data). I have the phenotypic trait value for each sample however I need also the genotypic data (SNPs) for those samples. In other words I would need a SNP matrix of size (10 * number of loci or sites). I'm not familiar at all in identifying or analyzing SNPs from those genomic sequences therefore I would really appreciate it if you could provide me assistance in terms of how would I convert a set of 10 genomic sequences into this genotypic matrix? I would appreciate it if you know a good R package which could assist me in doing this?

phylogenetics • 2.4k views
ADD COMMENT
2
Entering edit mode
9.5 years ago
smilefreak ▴ 420

If you put your sequences into individual files in the fastq format

http://en.wikipedia.org/wiki/FASTQ_format

Followed by using the bwa mem command to align these fastq files to a reference or a consensus sequence (this could just be one of your sequences), this file will need to be in FASTA format.

Samtools could then be used to call variants for each of your samples,then vcftools to convert to the plink format, and finally to perform the phenotypic association the tool PLINK.

http://bio-bwa.sourceforge.net/bwa.shtml (BWA)

http://vcftools.sourceforge.net/ (Vcftools)

http://www.htslib.org/ (Samtools)

http://pngu.mgh.harvard.edu/~purcell/plink/plink2.shtml (Plink)

ADD COMMENT
0
Entering edit mode
9.5 years ago

I would advise that you should try this tool developed by folks at Sanger Institute. It's used to identify snp sites from a multi fasta alignment. https://github.com/sanger-pathogens/snp_sites

ADD COMMENT

Login before adding your answer.

Traffic: 1468 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6