Question: How to print two lines of several files to a new file with the speicific order?
0
gravatar for ThulasiS
2.3 years ago by
ThulasiS60
ThulasiS60 wrote:

I have a task to do. I am doing multiple sequence analysis for some genes. I have several files with sequences in order. I would like to extract first sequence of each file into new file and like till the last sequence. I know only how to do with first or any specific line with awk. awk 'FNR == 2 {print; nextfile}' *.txt > newfile

Now I learned for two files with this

paste File1 File2 | awk '{ p=$2;$2="" }NR%2{ k=p; print }!(NR%2){ v=p; print $1 RS k RS v }'

Here I have input like this

File 1
Saureus081.1
ATCGGCCCTTAA
Saureus081.2
ATGCCTTAAGCTATA
Saureus081.3
ATCCTAAAGGTAAGG

File 2

SaureusRF1.1
ATCGGCCCTTAC
SauruesRF1.2
ATGCCTTAAGCTAGG
SaureusRF1.3
ATCCTAAAGGTAAGC

File 3 
SaureusN305.1 
ATCGGCCCTTACT 
SauruesN305.2 
ATGCCTTAAGCTAGA 
SaureusN305.3 
ATCCTAAAGGTAATG

And similar files 12 are there File 3 File 4 . . . .File 12 Required

Output
Saureus081.1
ATCGGCCCTTAA
SaureusRF1.1
ATCGGCCCTTAC
SaureusN305.1
ATCGGCCCTTACT
Saureus081.2
ATGCCTTAAGCTATA
SaureusRF1.2
ATGCCTTAAGCTAGG
SauruesN305.2
ATGCCTTAAGCTAGA
Saureus081.3
ATCCTAAAGGTAAGG
SaureusRF1.3
ATCCTAAAGGTAAGC
SaureusN305.3
ATCCTAAAGGTAATG

Thank you

awk sequence • 1.5k views
ADD COMMENTlink modified 2.3 years ago by genomax74k • written 2.3 years ago by ThulasiS60

Why the output has

Seq1
Seq1
Seq2

Can you precisely tell what is the output ? Do you want to create a separate file for each sequence and write the sequence from all the files ? Write seq1 from all files to a single file, and seq2 from all files to another file ... and so on ?

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by geek_y10.0k

I typed those just for example Sorry for the laziness. Its nothing but Seq of next file with sequence.

ADD REPLYlink written 2.3 years ago by ThulasiS60

works with bash shell. Install seqkit from here. Keep your fasta files in a separate folder. Output will be out.fasta and extension can be customized.

Code that works (on ubuntu/mint with bash shell):

# lists fasta files in current directory and counts the number of fasta records in first file

n=$(grep \>  $(ls *.fa| head -1) | wc -l)

# there are two loops here. Outer loop is on number of fasta records in each file and inner loop works on number of fasta files in current directory.

 for j in $(seq 1 $n); do
    for i in $(ls *.fa)
          do
          seqkit fx2tab $i | awk "NR==$j {print}"| seqkit tab2fx >> out.fasta
          done
done

input files (input files are copy/pasted from above):

$ ls 
test1.fa  test2.fa  test3.fa

input:

$ cat test1.fa 
>Saureus081.1
ATCGGCCCTTAA
>Saureus081.2
ATGCCTTAAGCTATA
>Saureus081.3
ATCCTAAAGGTAAGG

$ cat test2.fa 
>SaureusRF1.1
ATCGGCCCTTAC
>SauruesRF1.2
ATGCCTTAAGCTAGG
>SaureusRF1.3
ATCCTAAAGGTAAGC

$ cat test3.fa 
>SaureusN305.1 
ATCGGCCCTTACT 
>SauruesN305.2 
ATGCCTTAAGCTAGA 
>SaureusN305.3 
ATCCTAAAGGTAATG

Post ouptut:

$ ls
out.fasta  test1.fa  test2.fa   test3.fa  script.sh

output (from the above command): .

$ cat out.fasta 
>Saureus081.1
ATCGGCCCTTAA
>SaureusRF1.1
ATCGGCCCTTAC
>SaureusN305.1 
ATCGGCCCTTACT 
>Saureus081.2
ATGCCTTAAGCTATA
>SauruesRF1.2
ATGCCTTAAGCTAGG
>SauruesN305.2 
ATGCCTTAAGCTAGA 
>Saureus081.3
ATCCTAAAGGTAAGG
>SaureusRF1.3
ATCCTAAAGGTAAGC
>SaureusN305.3 
ATCCTAAAGGTAATG
ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by cpad011212k
3
gravatar for microfuge
2.3 years ago by
microfuge1.4k
microfuge1.4k wrote:

Just a paste and awk based approach based on the assumption that a) the sequence is in one line and not wrapped b) no tabs spaces in parent fasta. The getline function in awk gets the next line and it is then skipped by awk.

paste -d "\t"  *.fa |awk '{getline y;split(y,z,"\t");for (i=1;i<=NF;i++){print $i "\n" z[i]}  }'
ADD COMMENTlink written 2.3 years ago by microfuge1.4k

Thank you So much.. It worked exactly like how I needed the output

ADD REPLYlink written 2.3 years ago by ThulasiS60
0
gravatar for Pierre Lindenbaum
2.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

using sqlite3: just put your sequence in a database and pull them out.

 v=1 && \
rm -f db.sqlite3 &&  \
sqlite3 db.sqlite3 'create table S(name,sequence,num,file);' &&  \
for F in input1.txt input2.txt input3.txt
do
awk -v fidx=$v '{if(NR%2==1) {printf("insert into S(name,sequence,num,file) values(\"%s\",\"",$0);} else {num++;printf("%s\",%d,%d);\n",$0,num,fidx);}}'  $F |\
sqlite3 db.sqlite3; 
((v++))
done && \
v=$(sqlite3 db.sqlite3 'select max(num) from S;') && \
while [ $v -gt 0 ]
do
sqlite3 db.sqlite3 "select (name||x'0A'||sequence) from S where num=$v order by file;" > out${v}.txt ;((v--));
done
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Pierre Lindenbaum124k

Hi Thanks for script but when I am running this script I am getting these errors

Error: near line 1: near "v": syntax error
Error: near line 5: near "do": syntax error
Error: incomplete SQL: ((v++))
done && \
v=$(sqlite3 db.sqlite3 'select max(num) from S;') && \
while [ $v -gt 0 ]
do
sqlite3 db.sqlite3 "select (name||x'0A'||sequence) from S where num=$v order by file;" > out${v}.txt ;((v--));
done

I don't know the reason because I never used sqlite earlier. Thank you

ADD REPLYlink modified 2.3 years ago by genomax74k • written 2.3 years ago by ThulasiS60

Ah yes, I've reformatted it on the fly. I've just removed a semicolon, can you please try again please.

ADD REPLYlink written 2.3 years ago by Pierre Lindenbaum124k

I tried but again same errors I want to edit my question once. Please take a look and thank you for your help

Error: near line 1: near "v": syntax error
Error: incomplete SQL: ((v++))
done && \
v=$(sqlite3 db.sqlite3 'select max(num) from S;') && \
while [ $v -gt 0 ]
do
sqlite3 db.sqlite3 "select (name||x'0A'||sequence) from S where num=$v order by file;" > out${v}.txt ;((v--));
done
ADD REPLYlink modified 2.3 years ago by genomax74k • written 2.3 years ago by ThulasiS60

okay here is the gist that worked on my machine:

ADD REPLYlink written 2.3 years ago by Pierre Lindenbaum124k

Sorry again Only syntax errors are coming

Error: near line 1: near "v": syntax error
Error: near line 5: near "do": syntax error
Error: near line 8: near "(": syntax error
Error: near line 9: near "done": syntax error
Error: incomplete SQL: do 
sqlite3 db.sqlite3 "select (name||x'0A'||sequence) from S where num=$v order by file;" > out${v}.txt ;((v--)); 
done

Maybe fault with my data?

Shell is bin/zsh

Sorry I can't add more replies since I am a new user. So I am editing my previous commnet. Thank you

ADD REPLYlink modified 2.3 years ago by genomax74k • written 2.3 years ago by ThulasiS60

what is your shell ?

ADD REPLYlink written 2.3 years ago by Pierre Lindenbaum124k
1

Shell is bin/zsh

From post above.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by genomax74k

use /bin/bash please

ADD REPLYlink written 2.3 years ago by Pierre Lindenbaum124k

Okay Sure I'll try with Bash But I found the solution here given by microfuge

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by ThulasiS60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1695 users visited in the last hour