Question: Genomics (DNA) Pipeline - Example
0
gravatar for caspase8mach
18 months ago by
caspase8mach10
caspase8mach10 wrote:

Hello all,

Is there a good example of a genomics pipeline ready to be used for mapping/alignment of NGS data (preferably whole genome) followed by variant calling / annotation along with generation / visualization of quality matrices? It will be even better if the suggested pipeline is Python based.

I would like to use publicly available Fastq and/or bam files to 'learn' and demonstrate the entire DNA analysis workflow.

Your help and suggestions will be greatly appreciated.

Thanks much.

ADD COMMENTlink written 18 months ago by caspase8mach10
2

Not a ready to use workflow, but if your goal is to learn, you might want to have a look at the tutorial about Creating workflows with snakemake and conda I've wrote some time ago.

ADD REPLYlink written 18 months ago by finswimmer13k

Thanks finswimmer for the workflow ... certainly will help me to learn.

By any chance do you have links for the .fa and multiple fastq files for me to give this example a try?

Do I also have to provide an index file?

TIA

ADD REPLYlink modified 18 months ago • written 18 months ago by caspase8mach10
1

Hello caspase8mach ,

you can search in the European Nucleotide Archive for a suitable public dataset (This tutorial by ATpoint might be useful for you as well)

Do I also have to provide an index file?

What index file do you mean?

fin swimmer

ADD REPLYlink written 18 months ago by finswimmer13k

Awesome, thanks a lot for the link to the nice tutorial! Its great!

What index file do you mean?

What index file do you mean?

For mapping the Fastq file using a reference genome, do I need to create an index first?

Thanks a lot.

ADD REPLYlink modified 18 months ago • written 18 months ago by caspase8mach10
1

Yes, you need to create an index for the reference genome. How you create this index, depends on the aligner you like to use. E.g. for bwa it's a simple bwa index genome.fa

ADD REPLYlink written 18 months ago by finswimmer13k

Thanks a lot. As suggested, I created index file using bwa index hg19.fasta and got the following files: hg19.fasta hg19.fasta.amb hg19.fasta.ann hg19.fasta.bwt hg19.fasta.pac hg19.fasta.sa I did manage to align a pair of FastQ files using your Snakemake tutorial, hurray ... my first NGS DNA Analysis pipeline!

Now my questions is .... how is the analysis done in production, to analyze several samples, is it possible to do in parallel fashion, cloud computing, etc., any examples?

Thanks a ton for your help.

ADD REPLYlink modified 17 months ago • written 17 months ago by caspase8mach10
1

If you start snakemake with the --cores parameter e.g. --cores 4 it runs 4 jobs in parallel.

snakemakecan also be used with cluster and cloud support. See the manual for it. Unfortunately I have no experiences with this.

ADD REPLYlink written 17 months ago by finswimmer13k

Certainly helpful, will give it a try and let you know. Any one with an experience with the Apache Spark based DNA NGS Pipeline(s)?

Thanks

ADD REPLYlink written 17 months ago by caspase8mach10
1

nextfow pipelines: https://github.com/search?q=bwa+extension%3Anf+HaplotypeCaller (not python)

ADD REPLYlink written 18 months ago by Pierre Lindenbaum129k

Thanks for the info, but somehow I am not able to access the URL you wrote/suggested. Could you please give me the correct URL? Thanks

ADD REPLYlink written 18 months ago by caspase8mach10
2
gravatar for Fabio Marroni
18 months ago by
Fabio Marroni2.6k
Italy
Fabio Marroni2.6k wrote:

A quick google search gave me these, I hope that they are useful!

https://www.nature.com/articles/s41598-018-25022-6

http://bioinformatics.astate.edu/dna-pipeline/

https://genestack-user-tutorials.readthedocs.io/guide/intro-to-ngs.html

ADD COMMENTlink modified 18 months ago • written 18 months ago by Fabio Marroni2.6k

Thanks a lot for the information. Am going through the suggested resources to learn and build a genomics pipeline(s). Thanks a lot.

ADD REPLYlink written 18 months ago by caspase8mach10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1336 users visited in the last hour