How To Convert 454 Data To Sam Format?
5
1
Entering edit mode
11.1 years ago
Litali ▴ 50

Many viewers are adjusted to SAM format, How can I convert 454 output to this format? Thank you!

EDIT: The OP specifies that: 'It would be ok to have either the ACE file or the sff file in SAM format'.

viewer sam alignment • 7.9k views
ADD COMMENT
2
Entering edit mode

Hi litali, It is rather unclear what you mean by '454 output', mostly since you want to put it in an alignment format. Are you referring to the .sff file that comes out of the Roche sequencer? Or maybe to the sequences once they are assembled, possibly in .ace format? This should help us help you. Cheers.

ADD REPLY
0
Entering edit mode

added the script

ADD REPLY
10
Entering edit mode
11.1 years ago

It is a totally justified question, though it's an alignment process what is required not only a conversion, there are several possible pipelines. Also, knowing what data you are having would help a lot.

You need data in fasta or fastq format and your reference genome in fasta format.

If your data is in .sff (Standard Flowspace Format) you have to convert to fasta format using the sffinfo program coming with the 454 software.

I have a rather old version of the GS FLX manual and there sffinfo didn't write a fastq file, but both a fasta file and a quality file. Another option is sff_extract, but that doesn't give fastq either.

The data can be combined into a fastq file using a simple perl script (I can post one if required), or discard the qualities and align the fasta file only.

Then align your 454 reads against the reference sequence/genome using an alignment software that can output SAM format and works with "medium length" reads. One tool that directly aligns fasta and gives SAM is lastz, you have to play with the switches though.

  • BWA is another option but requires fastq, depending on read-length use BWA-SW algorithm.
  • SSAHA2 was mentioned before.
  • shrimp supports both fastq and fasta and should also support longer reads
  • there are many more tools here, your mileage may vary
  • keep in mind the read lengths of the 454 reads
  • as read lengths vary with 454, I prefer a percent-wise identity cutoff over an absolute number of mismatches

Simple as that ;)

Edit, here is a simple perl script that makes a fastq file out of fasta file and a qualiti file. It's not much tested and if the headers and data in fasta and qual file are not exactly matching, it fails miserably.

#!/usr/bin/env perl

use strict;
use warnings;

die ("Usage: fasta2fastq <fasta.file> <qual.file>") unless  (scalar @ARGV) == 2;

open FASTA, $ARGV[0] or die "cannot open fasta: $!\n";
open QUAL, $ARGV[1] or die "cannot open qual: $!\n";

my $offset = 33; # I think this was 33 for sanger FASTQ, change this if required!
my $count = 0;

local($/) = "\n>"; # split the input fasta file by FASTA records
# this is some splitting of the fasta by line
while (my $fastarec = <FASTA>) {
  chomp $fastarec;
  my ($fid, @seq) = split "\n", $fastarec;   
  my $seq = join "", @seq; $seq =~ s/\s//g;
  my $qualrec = <QUAL>;
  chomp $qualrec;
  my ($qid, @qual) = split "\n", $qualrec;
  @qual = split /\s+/, (join( " ", @qual));
  # convert score to character code:
  my @qual2 = map {chr($_+$offset)} @qual;
  my $quals = join "", @qual2; 
  die "missmatch of fasta and qual: '$fid' ne '$qid'" if $fid ne $qid;
  $fid =~ s/^\>//;
  print STDOUT (join( "\n", "@".$fid, $seq, "+$fid", $quals), "\n");
  $count++;
}
close (FASTA);
close (QUAL);
print STDERR "wrote $count entries\n";
ADD COMMENT
0
Entering edit mode

Could you please post the script?

ADD REPLY
0
Entering edit mode

Hi, I tried this and received an error: global symbol $count requires explicit package name.Execution aborted due to compilation error..

ADD REPLY
0
Entering edit mode

try again, I forgot to initialize the variable

ADD REPLY
1
Entering edit mode
11.1 years ago
Ian 5.8k

This may only partially help, but SSAHA2 reportedly outputs SAM format.

A similar question has also been previously posted on BioStar.

ADD COMMENT
1
Entering edit mode
11.0 years ago
Casbon ★ 3.2k

I had success with glu genetics, but you might need to fight the installer as noted on the question I asked and answered.

ADD COMMENT
0
Entering edit mode
11.0 years ago
Lhl ▴ 730

try mosaik aligner. It is well designed for working with 454 data and it supports SAM format (you need use mosaiktext to transfer mosaikalign.dat to sam format although).

ADD COMMENT

Login before adding your answer.

Traffic: 1772 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6