Question: Output fasta file with some sequences as the reverse complement
0
gravatar for casey
6 months ago by
casey0
casey0 wrote:

Hi all,

First time post and relatively new to bioinformatics but hoping to find a solution to my problem.

I am trying to write an awk script that input a fasta file containing a set of very similar sequences, some of them are from the negative strand while others are from the positive strand and hoping to output these sequences in the same direction. I know the direction of the strand is from the positive strand if the 9th position is "G" which if matched, would then replace the sequences with the reverse complement.

I dont have much as of yet as I thought i could pipe the output of Awk to revseq but I was unsure how to keep the headers

awk -F '' '$9 =="G"' | revseq

As a basic example: (note the headers of each sequence do begin with a >)

seq1
ACT
seq2
ATG
seq3
ATT

If 3rd position = "T" replace sequence with the reverse complement. so output would look like

Output:

seq1
AGT
seq2
ATG
seq3
AAT
sequence genome • 202 views
ADD COMMENTlink modified 6 months ago by yztxwd380 • written 6 months ago by casey0
1

Just a side note: It's complement, not compliment.

ADD REPLYlink modified 6 months ago • written 6 months ago by RamRS27k
2
gravatar for yztxwd
6 months ago by
yztxwd380
Southern Medical University
yztxwd380 wrote:

try this:

awk  '{if(NR%2) {print} else if(/[ATCG]{2}T[ATCG]*/) {system("echo "$0" | rev | tr ATCG TAGC")} else {print}}' test.fa

if you want to use 9th position to determine the direction, replace 2 with 8:

awk  '{if(NR%2) {print} else if(/[ATCG]{8}T[ATCG]*/) {system("echo "$0" | rev | tr ATCG TAGC")} else {print}}' test.fa
ADD COMMENTlink written 6 months ago by yztxwd380

This is exactly what I was chasing, Thanks!

ADD REPLYlink written 6 months ago by casey0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1172 users visited in the last hour