Question

A Workflow Of Population Genomic Operations/Analysis For Newcomers

8

Entering edit mode

13.7 years ago

Jianfengmao ▴ 320

Dear BioStars,

I just move from classical population genetics to genomics/population genomics. I need to set up my genomic handling platform and ability. I have used R for statistics for 3 years, so bioconductor is preferable to me.

In my current study, we sequenced genomes of tens of accessions of a plant, by Illumina next generation sequencer. And, now the reads have been aligned with the reference genome.

I have not any experiences of genomic analysis. On the beginning, I checked all the available packages for sequence analyses of the bioconductor, and read their manual. And also, I surveyed the courses in bioconductor websites. But, I still can not make a full and effective workflow for me to do population genomic analysis, though I have witnessed much excellent genomic implements of bioconductor.

I need hints, tips, suggestions, and advice on making an explicit and effective workflow for me to do the following analysis by using bioconductor or maybe not:

mutation types. e.g. CG -> AT, CG -> TA etc. polarized with the relative genomes
Polymorphism along chromosomes (or scaffold)
Polymorphism by type; intergenic, CDs etc.; and polymorphism by metabolic network
LD and recombination
drastic mutations. e.g. stop codons etc. in gene family, Gene Ontology
Population structure using STRUCTURE
Fst among groups
association studies

workflow population comparative • 5.6k views

ADD COMMENT • link updated 13.7 years ago by Khader Shameer 18k • written 13.7 years ago by Jianfengmao ▴ 320

1

Entering edit mode

Relevant bioconductor thread

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 13.7 years ago by Brad Chapman 9.7k

score 3 · Answer 1 · 2010-12-14

I am unaware of a workflow system that meets all of your requirements. However, two widely used systems for high-throughput population genetic analysis (=population genomics) are Variscan and LibSeq. You would need to use your reference genome annotation and wrapper scripts to run either of these systems, but both provide functionality for requirements 1-4.

Ram · Answer 2 · 2010-12-13

2

Entering edit mode

13.7 years ago

Rm 8.3k

PoPOOLation tool; Even though I haven't tried it...you can test it and see how it fits to your requirement.

It has a collection of tools to facilitate population genetic studies of next generation sequencing data.

ADD COMMENT • link updated 4.8 years ago by Ram 44k • written 13.7 years ago by Rm 8.3k

score 2 · Answer 3 · 2010-12-15

As suggested by Casesy - I don't know about a published workflow that suits your task. But most of these steps are part of GWAS analysis and routinely performed in Computational genomics labs. IMHO, your requirement includes two types of analysis tracks:

Genetic / genomics analysis using raw or simulated data

Annotation and interpretation of results from step 1 (genomics studies).

You may use PLINK (or via R plugin) for implementing most of your genetics / genomics based tasks (for example: LD, Recombinations, FST) etc.

Once you identified your mutations / polymorphisms you may move to next level of annotations. For annotation you may use a variety of tools that discussed here in biostar in several previous posts (see: SNP effects on amino acids, variation databases, SNPs of unknown significance etc). By integrating PLINK with some of the annotation resources discussed in those questions, you can develop such a work-flow.