Question: Workflow Or Tutorial For Snp Calling?
21
gravatar for Matthieu
4.0 years ago by
Matthieu210
Nova Scotia, Canada
Matthieu210 wrote:

I am looking for a good workflows, readings or tutorial for SNP calling. I read some other posts on this topic, but I would like a more detailed explanation. Population genomics and sequence data are new to me (I have a general CS and biology background). It might just be me, but these tools are not as straightforward or as documented as I'd like. Any links or explanations would be good!

So far, my situation is as follows:

  • I have Illumina sequence reads for a highly polymorphic species
  • I aligned these reads using BWA against a reference genome with default parameters, but I am not sure if I should change parameters (if so, which ones?) due to the highly polymorphic data
  • I am unsure of the next step, I will probably be using SAMtools or GATK... I tried making an mpile up but got really confused after that.
  • I should also be accessing SNP quality..what tools are used for that? I already see some sequencing errors when browsing the data.

As you can tell, I am totally new with this. It is pretty exciting so I want to learn and be able to do some of these things! Thanks in advance.

edit: I also get so confused with some of the output, more detailed documentation on that would be nice as well!

ADD COMMENTlink modified 21 months ago by rob234king430 • written 4.0 years ago by Matthieu210
1

You referenced several other posts but not http://biostar.stackexchange.com/questions/1269/what-is-the-best-pipeline-for-human-whole-exome-sequencing ; you may find it helpful.

ADD REPLYlink modified 2.5 years ago by Istvan Albert ♦♦ 53k • written 4.0 years ago by David Quigley9.5k

ah, I missed that. It is very helpful. Thank you, David!

ADD REPLYlink written 4.0 years ago by Matthieu210
13
gravatar for Pablo
4.0 years ago by
Pablo1.6k
Canada
Pablo1.6k wrote:
  • BWA using defaults it's probably OK.

  • If you have a SAI file from the previous step, you need to convert it to SAM or BAM. Do something like (assuming your reference genome is hg37.fasta)

    bwa samse hg37.fasta s.sai s.fastq > s.sam

  • Then, Create BAM file (we assume you installed SamTools)

    samtools view -S -b s.sam > s.bam

  • Sort BAM file (will create s_sort.bam)

    samtools sort s.bam s_sort

  • Call variants: I.e. Create VCF file (BcfTools is part of samtools distribution)

    samtools mpileup -uf hg37.fasta s_sort.bam | bcftools view -vcg - > s.vcf

There is a lot more (like local realignment, etc.). But if this is your first time doing it, you should start with the basics.

ADD COMMENTlink written 4.0 years ago by Pablo1.6k
11
gravatar for Jim Vallandingham
4.0 years ago by
Kansas City, MO
Jim Vallandingham320 wrote:

We are working on a SNP pipeline now. You might find my work in progress pipeline useful.

The pipeline currently starts with an alignment from BWA. It uses GATK for SNP calling.

Briefly, the flow involves:

  • Realigning the BAM file using GATK's RealignerTargetCreator and IndelRealigner
  • Optional recalibration step (we currently don't use)
  • SNP calling using GATK's UnifiedGenotyper
  • Indel calling using the UnifiedGenotyper
  • basic filtering of the resulting VCF files
    • Right now we use some basic metrics to attempt to filer out low quality SNPs. I'm sure this step could be improved
  • Annotate called and filtered SNPs
    • Currently we use a custom script to add gene / transcript / exon / intron and other information from Ensembl.
    • Recently (yesterday) I found snpEff from another BioStar discussion. We will be using it in the future for this kind of annotation.

We are still fleshing out the details on filtering and such, but it might be a good starting point to for executing GATK in a working order

ADD COMMENTlink modified 3.1 years ago • written 4.0 years ago by Jim Vallandingham320

I've been asked to do something similar and would use the same programs as you, so I will have a look and see if I can get yours working. snpEff is very good for annotation as I expect you have found. Although I would look to expand and include a structural variation software such as pindel or Delly (embl uses this) because GATK does not detect large inserts well. Thanks very much for posting this, hopefully looking through your code will accelerate the process of putting my pipeline or modifying yours to what we want.

ADD REPLYlink written 19 months ago by rob234king430
3
gravatar for Khader Shameer
3.6 years ago by
Manhattan, NY
Khader Shameer15k wrote:

I strongly recommend this recent article from authors of GATK.

It covers various aspect associated with SNP calling in detail. At the same time do refer the software manual/wiki for up-to-date options incorporated in the toolkit.

alt text

ADD COMMENTlink written 3.6 years ago by Khader Shameer15k
0
gravatar for Anil
3.6 years ago by
Anil0
Anil0 wrote:

Use GALAXY software. It is free and user friendly and u will get most of the step in oe programme only.

ADD COMMENTlink written 3.6 years ago by Anil0
0
gravatar for tangming2005
22 months ago by
tangming2005740
Houston/MD Anderson Cancer Center
tangming2005740 wrote:

have a look at this course material from UT-Austin https://wikis.utexas.edu/display/bioiteam/SSC+Intro+to+NGS+Bioinformatics+Course

ADD COMMENTlink written 22 months ago by tangming2005740
0
gravatar for rob234king
21 months ago by
rob234king430
UK/Harpenden/Rothamsted Research
rob234king430 wrote:

I have put together a tutorial website with four core tutorials on it, RNA-Seq, ChIP-Seq, Genome assembly, and SNP calling that may be of use to you.

This website was created to share bioinformatics tutorials. 

http://elvis.misc.cranfield.ac.uk/CUBELP2/

ADD COMMENTlink modified 4 weeks ago • written 21 months ago by rob234king430
2

Login - Register is a major put-off, maybe make it optional?

ADD REPLYlink written 21 months ago by zx87542.9k

Thanks for the input, I know registering is a bit of a pain but the website is based upon groups that contain tutorials and I use the login to only show those tutorials from groups you are a member of to make it manageable in the future if it got bigger. You can use false details if your concerned, email etc it's just used for this purpose. If you register it will automatically log you in and if go to "groups" on side bar and join the "Cranfield University" group, then click on "group tutorials" in left side bar you can access the tutorials I have put up. I think they need a little re-formatting but have some useful demonstrations of tools like RSATs etc. Thanks again for the reply appreciate it.

ADD REPLYlink written 21 months ago by rob234king430

You got me curious, but the login system is not working…

ADD REPLYlink written 20 months ago by Andre Elias50

Thanks I'll check it out it seems the server needs restarting, there is a memory leak on one or more of the deployments on this server. I ask for it to be reset and investigate.

ADD REPLYlink written 20 months ago by rob234king430

A number of student are submitting to this server over the next month which is resulting in permgen errors, some kind of memory leak somewhere or due to multiple submissions. Issue corrected for the moment.

CUBELP2 is a static website and CUBELP a sharing platform, typing their names and "bioinformatics" which should locate them in google.

ADD REPLYlink modified 20 months ago • written 20 months ago by rob234king430
1

Please fix the link. It doesn't work anymore!

ADD REPLYlink written 4 weeks ago by student-t30

http://elvis.misc.cranfield.ac.uk/CUBELP2/

ADD REPLYlink written 4 weeks ago by rob234king430
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 334 users visited in the last hour