Question: Tools to extract per-locus alignments from bam files aligned using bwa
gravatar for betsy.s.collins
2.8 years ago by
betsy.s.collins110 wrote:

Hello everyone,

I am looking for a quick bioinformatics solution to extract an alignment for each locus in 96 different bam files (separate bam file for each of my 96 samples). Each bam files contains up to 96 different loci that I mapped to using BWA to create the per-sample bam files that I currently have. I am going to use alignments for each locus (that contain all samples that aligned to that locus) in population genetics analyses in the PopGenome package in R.

I was planning on importing each bam file into Geneious under a folder for each sample, obtaining the consensus sequence for each locus within each sample folder, and then copying the consensus sequence for each sample into a locus-specific folder. Once all consensus sequences for a particular locus are copied into a folder, I would align them and then export the alignment as a fasta file for use in R.

My workflow in Geneious works just fine, except it is going to take a very long time. I am new to bioinformatics and am sure that I am missing an easy solution to this problem that would be much faster than using Geneious!

Thank you for any help you can offer.

alignment • 1.3k views
ADD COMMENTlink modified 2.8 years ago by Michael Dondrup47k • written 2.8 years ago by betsy.s.collins110
  • use linux
  • use bash
  • use cmd-line tools like samtools
  • use loops
  • don't use commercial GUIs
ADD REPLYlink written 2.8 years ago by Pierre Lindenbaum127k

I don't think my scripting skills are quite up to snuff to be able to figure out the commands for this just using bash. I have been using samtools quite a bit, though. Do you have any suggestions for a particular command in samtools? I read through the manual several times but I can't find anything that would clearly do this task (but I'm sure it's because I'm missing it)

ADD REPLYlink written 2.8 years ago by betsy.s.collins110

It looks like the PopGenome package can also take VCF files, have you considered that instead? You could then use a more standard "align to the genome and call variants with GATK/samtools/freeBayes" protocol.

ADD REPLYlink written 2.8 years ago by Devon Ryan94k
gravatar for Michael Dondrup
2.8 years ago by
Bergen, Norway
Michael Dondrup47k wrote:

Not totally sure what you want to achieve and why, but samtools can easily filter bam files by position, say your locus is defined as chr3:1000-2000. If you want to do this for all bam files in a directory, this can be done in bash like so:

 LOCI="chr3:1000-2000 chr2:1-10" # etc. samtools supports multiple loci
  for F in *.bam ; do
      samtools view -bh $F $LOCI > $F.filtered.bam

Remark: If you generate a consensus sequence only, you will lose information on heterozygosity.

ADD COMMENTlink written 2.8 years ago by Michael Dondrup47k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1076 users visited in the last hour