Question

How to print two lines of several files to a new file with the speicific order?

0

Entering edit mode

6.7 years ago

ThulasiS ▴ 90

I have a task to do. I am doing multiple sequence analysis for some genes. I have several files with sequences in order. I would like to extract first sequence of each file into new file and like till the last sequence. I know only how to do with first or any specific line with awk. awk 'FNR == 2 {print; nextfile}' *.txt > newfile

Now I learned for two files with this

paste File1 File2 | awk '{ p=$2;$2="" }NR%2{ k=p; print }!(NR%2){ v=p; print $1 RS k RS v }'

Here I have input like this

File 1
Saureus081.1
ATCGGCCCTTAA
Saureus081.2
ATGCCTTAAGCTATA
Saureus081.3
ATCCTAAAGGTAAGG

File 2

SaureusRF1.1
ATCGGCCCTTAC
SauruesRF1.2
ATGCCTTAAGCTAGG
SaureusRF1.3
ATCCTAAAGGTAAGC

File 3 
SaureusN305.1 
ATCGGCCCTTACT 
SauruesN305.2 
ATGCCTTAAGCTAGA 
SaureusN305.3 
ATCCTAAAGGTAATG

And similar files 12 are there File 3 File 4 . . . .File 12 Required

Output
Saureus081.1
ATCGGCCCTTAA
SaureusRF1.1
ATCGGCCCTTAC
SaureusN305.1
ATCGGCCCTTACT
Saureus081.2
ATGCCTTAAGCTATA
SaureusRF1.2
ATGCCTTAAGCTAGG
SauruesN305.2
ATGCCTTAAGCTAGA
Saureus081.3
ATCCTAAAGGTAAGG
SaureusRF1.3
ATCCTAAAGGTAAGC
SaureusN305.3
ATCCTAAAGGTAATG

Thank you

sequence awk • 2.8k views

ADD COMMENT • link updated 6.7 years ago by GenoMax 141k • written 6.7 years ago by ThulasiS ▴ 90

0

Entering edit mode

Why the output has

Seq1
Seq1
Seq2

Can you precisely tell what is the output ? Do you want to create a separate file for each sequence and write the sequence from all the files ? Write seq1 from all files to a single file, and seq2 from all files to another file ... and so on ?

ADD REPLY • link 6.7 years ago by GouthamAtla 12k

0

Entering edit mode

I typed those just for example Sorry for the laziness. Its nothing but Seq of next file with sequence.

ADD REPLY • link 6.7 years ago by ThulasiS ▴ 90

0

Entering edit mode

works with bash shell. Install seqkit from here. Keep your fasta files in a separate folder. Output will be out.fasta and extension can be customized.

Code that works (on ubuntu/mint with bash shell):

# lists fasta files in current directory and counts the number of fasta records in first file

n=$(grep \>  $(ls *.fa| head -1) | wc -l)

# there are two loops here. Outer loop is on number of fasta records in each file and inner loop works on number of fasta files in current directory.

 for j in $(seq 1 $n); do
    for i in $(ls *.fa)
          do
          seqkit fx2tab $i | awk "NR==$j {print}"| seqkit tab2fx >> out.fasta
          done
done

input files (input files are copy/pasted from above):

$ ls 
test1.fa  test2.fa  test3.fa

input:

$ cat test1.fa 
>Saureus081.1
ATCGGCCCTTAA
>Saureus081.2
ATGCCTTAAGCTATA
>Saureus081.3
ATCCTAAAGGTAAGG

$ cat test2.fa 
>SaureusRF1.1
ATCGGCCCTTAC
>SauruesRF1.2
ATGCCTTAAGCTAGG
>SaureusRF1.3
ATCCTAAAGGTAAGC

$ cat test3.fa 
>SaureusN305.1 
ATCGGCCCTTACT 
>SauruesN305.2 
ATGCCTTAAGCTAGA 
>SaureusN305.3 
ATCCTAAAGGTAATG

Post ouptut:

$ ls
out.fasta  test1.fa  test2.fa   test3.fa  script.sh

output (from the above command): .

$ cat out.fasta 
>Saureus081.1
ATCGGCCCTTAA
>SaureusRF1.1
ATCGGCCCTTAC
>SaureusN305.1 
ATCGGCCCTTACT 
>Saureus081.2
ATGCCTTAAGCTATA
>SauruesRF1.2
ATGCCTTAAGCTAGG
>SauruesN305.2 
ATGCCTTAAGCTAGA 
>Saureus081.3
ATCCTAAAGGTAAGG
>SaureusRF1.3
ATCCTAAAGGTAAGC
>SaureusN305.3 
ATCCTAAAGGTAATG

ADD REPLY • link 6.7 years ago by cpad0112 21k

1

Entering edit mode

6.7 years ago

Pierre Lindenbaum 161k

using sqlite3: just put your sequence in a database and pull them out.

 v=1 && \
rm -f db.sqlite3 &&  \
sqlite3 db.sqlite3 'create table S(name,sequence,num,file);' &&  \
for F in input1.txt input2.txt input3.txt
do
awk -v fidx=$v '{if(NR%2==1) {printf("insert into S(name,sequence,num,file) values(\"%s\",\"",$0);} else {num++;printf("%s\",%d,%d);\n",$0,num,fidx);}}'  $F |\
sqlite3 db.sqlite3; 
((v++))
done && \
v=$(sqlite3 db.sqlite3 'select max(num) from S;') && \
while [ $v -gt 0 ]
do
sqlite3 db.sqlite3 "select (name||x'0A'||sequence) from S where num=$v order by file;" > out${v}.txt ;((v--));
done

ADD COMMENT • link 6.7 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Hi Thanks for script but when I am running this script I am getting these errors

Error: near line 1: near "v": syntax error
Error: near line 5: near "do": syntax error
Error: incomplete SQL: ((v++))
done && \
v=$(sqlite3 db.sqlite3 'select max(num) from S;') && \
while [ $v -gt 0 ]
do
sqlite3 db.sqlite3 "select (name||x'0A'||sequence) from S where num=$v order by file;" > out${v}.txt ;((v--));
done

I don't know the reason because I never used sqlite earlier. Thank you

ADD REPLY • link updated 6.7 years ago by GenoMax 141k • written 6.7 years ago by ThulasiS ▴ 90

0

Entering edit mode

Ah yes, I've reformatted it on the fly. I've just removed a semicolon, can you please try again please.

ADD REPLY • link 6.7 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

I tried but again same errors I want to edit my question once. Please take a look and thank you for your help

Error: near line 1: near "v": syntax error
Error: incomplete SQL: ((v++))
done && \
v=$(sqlite3 db.sqlite3 'select max(num) from S;') && \
while [ $v -gt 0 ]
do
sqlite3 db.sqlite3 "select (name||x'0A'||sequence) from S where num=$v order by file;" > out${v}.txt ;((v--));
done

ADD REPLY • link updated 6.7 years ago by GenoMax 141k • written 6.7 years ago by ThulasiS ▴ 90

0

Entering edit mode

okay here is the gist that worked on my machine:

ADD REPLY • link 6.7 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Sorry again Only syntax errors are coming

Error: near line 1: near "v": syntax error
Error: near line 5: near "do": syntax error
Error: near line 8: near "(": syntax error
Error: near line 9: near "done": syntax error
Error: incomplete SQL: do 
sqlite3 db.sqlite3 "select (name||x'0A'||sequence) from S where num=$v order by file;" > out${v}.txt ;((v--)); 
done

Maybe fault with my data?

Shell is bin/zsh

Sorry I can't add more replies since I am a new user. So I am editing my previous commnet. Thank you

ADD REPLY • link updated 6.7 years ago by GenoMax 141k • written 6.7 years ago by ThulasiS ▴ 90

0

Entering edit mode

what is your shell ?

ADD REPLY • link 6.7 years ago by Pierre Lindenbaum 161k

1

Entering edit mode

Shell is bin/zsh

From post above.

ADD REPLY • link 6.7 years ago by GenoMax 141k

0

Entering edit mode

use /bin/bash please

ADD REPLY • link 6.7 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Okay Sure I'll try with Bash But I found the solution here given by microfuge

ADD REPLY • link 6.7 years ago by ThulasiS ▴ 90

score 3 · Accepted Answer · 2017-08-09

3

Entering edit mode

6.7 years ago

microfuge ★ 1.9k

Just a paste and awk based approach based on the assumption that a) the sequence is in one line and not wrapped b) no tabs spaces in parent fasta. The getline function in awk gets the next line and it is then skipped by awk.

paste -d "\t"  *.fa |awk '{getline y;split(y,z,"\t");for (i=1;i<=NF;i++){print $i "\n" z[i]}  }'