Question: fasta - reverse complement sequence
1
gravatar for mbk0asis
17 months ago by
mbk0asis230
Korea, Republic Of
mbk0asis230 wrote:

Hi.

I have a fasta file with multiple headers and want to get reverse complement sequences. I usually use FASTX-TOOLKIT, but I want to learn how to do with linux commands.

I tried it using 'awk' and 'if' something like below, but I got a tedious result.

awk '{if($1 ~ />/) print $0; else gsub("[ATCG]","[TAGC]");print $0}' TEST.fa

As you can see, the file has the same string in the header.

How can I get reverse complement sequences without altering the headers?

> ATCG_1
agtcgacATCGATtataggta
> ATCG_2
cgactgcagtcgacATCGACT

Thank you!

bash reverse complement fasta • 2.0k views
ADD COMMENTlink modified 17 months ago by chen970 • written 17 months ago by mbk0asis230
2

This is the kind of thing one shouldn't bother doing with awk or similar tools. Sure, you can come up with a solution, but why bother?

ADD REPLYlink written 17 months ago by Devon Ryan70k

I have this script for quick rev comp, but it assumes pure sequence as input (no headers).

#!/bin/bash
if [ ! -z "$1" ]; then
    echo "$1" | tr "[ATGCatgc]" "[TACGtacg]" | rev
else
    echo ""
    echo "usage: rev_comp_seq DNASEQUENCE"
    echo ""
fi
ADD REPLYlink modified 17 months ago • written 17 months ago by 5heikki6.6k
6
gravatar for Pierre Lindenbaum
17 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum98k wrote:

using 'tr', and 'rev', 2 lines per sequence

 cat input.fa | while read L; do  echo $L; read L; echo "$L" | rev | tr "ATGC" "TACG" ; done
ADD COMMENTlink written 17 months ago by Pierre Lindenbaum98k

Thank you, Pierre! It works well. What exactly is the difference between $L and "$L"? How does it work?

ADD REPLYlink written 17 months ago by mbk0asis230

$L and "$L" there is no difference while read L: the programs reads one line (the header), we print the line; we read the very next line into L; we print the last L, we reverse and translate it.

ADD REPLYlink written 17 months ago by Pierre Lindenbaum98k

Thank you! I understood!

ADD REPLYlink written 16 months ago by mbk0asis230

Thank! But I also have problem: that command can't work for sequencs with several lines.

ADD REPLYlink written 7 months ago by yangjie45460

linearize the sequence and make a 'two line' fasta

ADD REPLYlink written 7 months ago by Pierre Lindenbaum98k
2
gravatar for venu
17 months ago by
venu4.3k
Germany
venu4.3k wrote:

Assuming 2 lines per sequence (header and sequence), if not linearize and try something like following or modify more to get how you need

cat fasta.fa | paste - - | awk -F'\t' -vOFS='\t' '{gsub("A", "T", $2); gsub("T", "A", $2); gsub("G", "C", $2); gsub("C", "G", $2); print}' | tr '\t' '\n'
ADD COMMENTlink modified 17 months ago • written 17 months ago by venu4.3k

Thank you for the answer! However, the code didn't work because gsubs serially change the sequences. Firstly. all A's are turned to T's, then second 'gsub' change them back to A's. So finally, all the sequences become a pool of A's and G's. Can I use gsub to change multiple strings at once?

ADD REPLYlink written 17 months ago by mbk0asis230
0
gravatar for iraun
17 months ago by
iraun3.1k
Norway
iraun3.1k wrote:

Perl?

perl -nle'BEGIN {
    @map{ A, a, C, c, G, g, T, t } = ( T, t, G, g, C, c, A, a )
    }
    print /^>/ ?
        $_ :
              join //, map $map{ $_ }, split //, scalar reverse
    ' file.fa
ADD COMMENTlink written 17 months ago by iraun3.1k
0
gravatar for chen
17 months ago by
chen970
Strange Tools: https://github.com/OpenGene
chen970 wrote:

Try OpenGene (https://github.com/OpenGene/OpenGene.jl) to get reverse complement very easily with an operator ~

julia> using OpenGene

julia> seq = dna("AAATTTCCCGGGATCGATCGATCG")
dna:AAATTTCCCGGGATCGATCGATCG

julia> ~seq
dna:CGATCGATCGATCCCGGGAAATTT

ADD COMMENTlink written 17 months ago by chen970
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 655 users visited in the last hour