Command Line Tool To Obtain Complementary And/Or Inverse Strand From Fasta Files
4
0
Entering edit mode
11.0 years ago

From a fasta file, I would like to obtain three new fasta files containing:

  1. Inverse
  2. Complement
  3. Antiparallel

I am looking for a command line tool that can do what is provided on http://www.fr33.net/seqedit.php with fast files as input and output.

Python code to do the inverse, complement and antiparallel steps are available : http://zientzilaria.herokuapp.com/blog/2008/03/11/fasta-module-generating-reverse-complement-of-dna-sequences/

But I would like to integrate this so that it can be applied to fasta files (and not do the inverse, complement and antiparallel on the labels).

This is related to how to convert a big fasta file with multi-line DNA sequences into a fasta file of reverse complement sequence

fasta dna • 13k views
ADD COMMENT
2
Entering edit mode

It looks like the answer to your question is in the links you provide?

ADD REPLY
0
Entering edit mode

The accepted answer is only for the reverse complement (ie antiparallel). I have tried getting biopieces to work on my system, but with no success.

ADD REPLY
0
Entering edit mode

Your question title states you want a CLI tool. Then you mention Python. Which do you prefer? Biopieces would be the easiest way to go for CLI.

ADD REPLY
8
Entering edit mode
11.0 years ago
Ido Tamir 5.2k

Your terminology is very confusing. The common names of these operations are reverse or complement which you can combine to the reverse complement.

In EMBOSS:

revseq test.fasta -reverse -complement -outseq test.revcomp.fasta
revseq test.fasta -noreverse -complement -outseq test.comp.fasta
revseq test.fasta -reverse -nocomplement -outseq test.rev.fasta
ADD COMMENT
0
Entering edit mode

I don't think this program is doing the right thing, I'm getting the exact sequence with very minor differences and only the header of the sequences are changed to include Reversed

ADD REPLY
1
Entering edit mode
11.0 years ago

Reversing and complementing sequences are separate tasks in Biopieces www.biopieces.org):

read_fasta -i in.fna | reverse_seq | write_fasta -o out.fna -x
read_fasta -i in.fna | complement_seq | write_fasta -o out.fna -x
read_fasta -i in.fna | reverse_seq | complement_seq | write_fasta -o out.fna -x
ADD COMMENT
1
Entering edit mode
7.5 years ago

Considering the DNA sequences in single-line format in a multifasta file:

cat multifasta_file.txt | while IFS= read L; do if [[ $L == >* ]]; then echo "$L"; else echo $L | rev | tr "ATGCatgc" "TACGtacg"; fi; done > output_file.txt

If your multifasta file is not in single-line format, you can transform your file to single-line before using the command above, like this:

awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}' <multifasta_file.txt &gt;multifasta_file_singleline.txt<="" p="">

Then,

cat multifasta_file_SingleLine.txt | while IFS= read L; do if [[ $L == >* ]]; then echo "$L"; else echo $L | rev | tr "ATGCatgc" "TACGtacg"; fi; done > output_file.txt

Hope it is useful for someone. It took me some time to build it.

ADD COMMENT
0
Entering edit mode
11.0 years ago
JC 13k

Perl solution:

#!/usr/bin/perl

use strict;
use warnings;

$ARGV[0] or die "usage: convertFasta.pl FILE\n";
my $file = shift @ARGV;
$/ = "\n>"; 
open F, "$file" or die;
open A, ">$file.antiparallel" or die;
open I, ">$file.inverse" or die;
open C, ">$file.complement" or die;

while (<F>) {
    my ($id, @seq) = split (/\n/, $_);
    my $seq = join "", @seq;
    my $seq_complement = $seq; 
    $seq_complement =~ tr/ACGTacgt/TGCAtgca/;
    my $seq_inverse = reverse $seq;
    my $seq_antiparallel = $seq_inverse;
    $seq_antiparallel =~ tr/ACGTacgt/TGCAtgca/;

    print I "$id\n$seq_inverse";
    print C "$id\n$seq_complement";
    print A "$id\n$seq_antiparallel";
}
ADD COMMENT

Login before adding your answer.

Traffic: 1946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6