Dear seqanswers,
I am new to genomics and bioinformatics. In my current study, we have sequenced the genomes of tens of accessions of a plant, using Illumina next generation sequencer. The short reads of a specific accession have been aligned to the reference. The SNPs and shor indels have been predicted for a specific accession genome to the reference. We have gotten the separate files for SNPs like the following format (in text file, the column names were listed to each accession, the accession name will not change for a specific accession):
<accession names> <chromosome><position><reference base><cons
base><quality><support><concordance><avg_hits>
But usually, we need to align all the accessions in the following format for classical population genetic analysis:
<accessions><SNP_1><SNP_2><SNP_3><SNP_...>
accession_1, a,t,g,,,
accession_2, a,t,c,,,
accession_3, t,a,c,,,
accession_,,,,,,,,,,,,,
I need to get helps, suggestions on how to do this format conversion, or if there are any alternative choices for me, by using R and bioconductor or other tools? If it need database operations, and how to do that?
Thanks in advance.
This is Biostar group not seqanswers.
Relevant Bioconductor thread