Question: How To Convert 454 Data To Sam Format?
gravatar for Litali
9.8 years ago by
Litali50 wrote:

Many viewers are adjusted to SAM format, How can I convert 454 output to this format? Thank you!

EDIT: The OP specifies that: 'It would be ok to have either the ACE file or the sff file in SAM format'.

viewer alignment sam • 7.4k views
ADD COMMENTlink modified 6.9 years ago by Biostar ♦♦ 20 • written 9.8 years ago by Litali50

Hi litali, It is rather unclear what you mean by '454 output', mostly since you want to put it in an alignment format. Are you referring to the .sff file that comes out of the Roche sequencer? Or maybe to the sequences once they are assembled, possibly in .ace format? This should help us help you. Cheers.

ADD REPLYlink written 9.8 years ago by Eric Normandeau10k

added the script

ADD REPLYlink written 9.8 years ago by Michael Dondrup47k
gravatar for Michael Dondrup
9.8 years ago by
Bergen, Norway
Michael Dondrup47k wrote:

It is a totally justified question, though it's an alignment process what is required not only a conversion, there are several possible pipelines. Also, knowing what data you are having would help a lot.

You need data in fasta or fastq format and your reference genome in fasta format.

If your data is in .sff (Standard Flowspace Format) you have to convert to fasta format using the sffinfo program coming with the 454 software.

I have a rather old version of the GS FLX manual and there sffinfo didn't write a fastq file, but both a fasta file and a quality file. Another option is sff_extract, but that doesn't give fastq either.

The data can be combined into a fastq file using a simple perl script (I can post one if required), or discard the qualities and align the fasta file only.

Then align your 454 reads against the reference sequence/genome using an alignment software that can output SAM format and works with "medium length" reads. One tool that directly aligns fasta and gives SAM is lastz, you have to play with the switches though.

  • BWA is another option but requires fastq, depending on read-length use BWA-SW algorithm.
  • SSAHA2 was mentioned before.
  • shrimp supports both fastq and fasta and should also support longer reads
  • there are many more tools here, your mileage may vary
  • keep in mind the read lengths of the 454 reads
  • as read lengths vary with 454, I prefer a percent-wise identity cutoff over an absolute number of mismatches

Simple as that ;)

Edit, here is a simple perl script that makes a fastq file out of fasta file and a qualiti file. It's not much tested and if the headers and data in fasta and qual file are not exactly matching, it fails miserably.

#!/usr/bin/env perl

use strict;
use warnings;

die ("Usage: fasta2fastq <fasta.file> <qual.file>") unless  (scalar @ARGV) == 2;

open FASTA, $ARGV[0] or die "cannot open fasta: $!\n";
open QUAL, $ARGV[1] or die "cannot open qual: $!\n";

my $offset = 33; # I think this was 33 for sanger FASTQ, change this if required!
my $count = 0;

local($/) = "\n>"; # split the input fasta file by FASTA records
# this is some splitting of the fasta by line
while (my $fastarec = <FASTA>) {
  chomp $fastarec;
  my ($fid, @seq) = split "\n", $fastarec;   
  my $seq = join "", @seq; $seq =~ s/\s//g;
  my $qualrec = <QUAL>;
  chomp $qualrec;
  my ($qid, @qual) = split "\n", $qualrec;
  @qual = split /\s+/, (join( " ", @qual));
  # convert score to character code:
  my @qual2 = map {chr($_+$offset)} @qual;
  my $quals = join "", @qual2; 
  die "missmatch of fasta and qual: '$fid' ne '$qid'" if $fid ne $qid;
  $fid =~ s/^\>//;
  print STDOUT (join( "\n", "@".$fid, $seq, "+$fid", $quals), "\n");
close (FASTA);
close (QUAL);
print STDERR "wrote $count entries\n";
ADD COMMENTlink modified 18 months ago by RamRS28k • written 9.8 years ago by Michael Dondrup47k

Could you please post the script?

ADD REPLYlink written 9.8 years ago by Litali0

Hi, I tried this and received an error: global symbol $count requires explicit package name.Execution aborted due to compilation error..

ADD REPLYlink written 9.8 years ago by Litali0

try again, I forgot to initialize the variable

ADD REPLYlink written 9.8 years ago by Michael Dondrup47k
gravatar for Ian
9.8 years ago by
University of Manchester, UK
Ian5.7k wrote:

This may only partially help, but SSAHA2 reportedly outputs SAM format.

A similar question has also been previously posted on BioStar.

ADD COMMENTlink modified 18 months ago by RamRS28k • written 9.8 years ago by Ian5.7k
gravatar for Casbon
9.8 years ago by
Casbon3.2k wrote:

I had success with glu genetics, but you might need to fight the installer as noted on the question I asked and answered.

ADD COMMENTlink written 9.8 years ago by Casbon3.2k
gravatar for Lhl
9.8 years ago by
United States
Lhl730 wrote:

try mosaik aligner. It is well designed for working with 454 data and it supports SAM format (you need use mosaiktext to transfer mosaikalign.dat to sam format although).

ADD COMMENTlink written 9.8 years ago by Lhl730
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 809 users visited in the last hour