Convert a two-column file to a single file with interlayed rows
3
0
Entering edit mode
4.7 years ago
JGuVa ▴ 10

Hello,

I am struggling to convert a file with nucleotide sequences into an actual fasta file (i.e. with the ">" seq_name / enter sequence format). I am following the strategy below, but any other suggestion is warmly welcome. I am trying to convert the following type of file:

>kmer_1   AAAAAAAAAAAAAAAAAAAAAAAACCCACCCA
>kmer_2   AAAAAAAAAAAAAAAAAAAAAAACAGAGATGT
>kmer_3   AAAAAAAAAAAAAAAAAAAAAAACCCACCCAC
>kmer_4   AAAAAAAAAAAAAAAAAAAAAACAGAGATGTA
>kmer_5   AAAAAAAAAAAAAAAAAAAAAACCCACCCACA
>kmer_6   AAAAAAAAAAAAAAAAAAAAAAGAAGAGAAAA
> kmer_7   AAAAAAAAAAAAAAAAAAAAACCCACCCACAT
>kmer_8   AAAAAAAAAAAAAAAAAAAAAGAAGAGAAAAA
>kmer_9   AAAAAAAAAAAAAAAAAAAAAGAGAACGACAC
>kmer_10  AAAAAAAAAAAAAAAAAAAACCCACCCACATG

Into something like this:

>kmer_1   
AAAAAAAAAAAAAAAAAAAAAAAACCCACCCA
>kmer_2   
AAAAAAAAAAAAAAAAAAAAAAACAGAGATGT
>kmer_3   
AAAAAAAAAAAAAAAAAAAAAAACCCACCCAC
>kmer_4   
AAAAAAAAAAAAAAAAAAAAAACAGAGATGTA
>kmer_5   
AAAAAAAAAAAAAAAAAAAAAACCCACCCACA
>kmer_6   
AAAAAAAAAAAAAAAAAAAAAAGAAGAGAAAA
>kmer_7   
AAAAAAAAAAAAAAAAAAAAACCCACCCACAT
>kmer_8   
AAAAAAAAAAAAAAAAAAAAAGAAGAGAAAAA
>kmer_9   
AAAAAAAAAAAAAAAAAAAAAGAGAACGACAC
>kmer_10  
AAAAAAAAAAAAAAAAAAAACCCACCCACATG

Any idea on how to do it by using either paste, awk, etc? Thanks in advance

sequence bash fasta paste awk • 1.3k views
ADD COMMENT
0
Entering edit mode

C: fasta file to tab delimited file

tab to fasta file conversion

I am following the strategy below

What strategy? You didn't show anything you tried.

ADD REPLY
1
Entering edit mode
4.7 years ago
ATpoint 81k

Assuming tab as delimiter:

tr "\t" "\n" < your.file
ADD COMMENT
0
Entering edit mode
4.7 years ago
The ▴ 180

For this type of simple conversion , I would use regex based find-and-replace features of text editors like Notepad++, provided the file is not too big for the editor to open .

Notepad++ Regex

ADD COMMENT
2
Entering edit mode

GUI apps are nice, but please consider carefully the use of Windows apps (or Microsoft apps on a Mac) to edit text files. These tools tend to add CR (carriage return) characters that are easy to remove (if you know about them) but otherwise can cause hidden misery for some open-source bioinformatics tools when used on Linux or OS X (or other UNIXes).

ADD REPLY
0
Entering edit mode

No problem for Notepad++, you can convert EOLs to unix style '\n' to windows '\r\n' and vice versa, under the EDIT-> EOL Conversion menu

ADD REPLY
0
Entering edit mode

I'd still recommend against GUI for such minor low-context edits. GUI apps make sense when editing scripts (where there is a bunch of back-and-forth navigation and the context of the edit matters a lot), but delimited files are seldom complicated enough to warrant GUI editing.

ADD REPLY
0
Entering edit mode
4.7 years ago
JGuVa ▴ 10

Hi, thanks to those who replied this question. I ended up doing it differently:

cat kmer_name.txt

>kmer_1
>kmer_2
>kmer_3
>kmer_4
>kmer_5
>kmer_6
>kmer_7
>kmer_8
>kmer_9
>kmer_10

cat kmer_seq.txt

AAAAAAAAAAAAAAAAAAAAAAAACCCACCCA
AAAAAAAAAAAAAAAAAAAAAAACAGAGATGT
AAAAAAAAAAAAAAAAAAAAAAACCCACCCAC
AAAAAAAAAAAAAAAAAAAAAACAGAGATGTA
AAAAAAAAAAAAAAAAAAAAAACCCACCCACA
AAAAAAAAAAAAAAAAAAAAAAGAAGAGAAAA
AAAAAAAAAAAAAAAAAAAAACCCACCCACAT
AAAAAAAAAAAAAAAAAAAAAGAAGAGAAAAA
AAAAAAAAAAAAAAAAAAAAAGAGAACGACAC
AAAAAAAAAAAAAAAAAAAACCCACCCACATG

And then simply changed the way I pasted both files:

paste -d '\n' kmer_name.txt kmer_seq.txt > kmer_name_seq.txt

cat kmer_name_seq.txt

>kmer_1
AAAAAAAAAAAAAAAAAAAAAAAACCCACCCA
>kmer_2
AAAAAAAAAAAAAAAAAAAAAAACAGAGATGT
>kmer_3
AAAAAAAAAAAAAAAAAAAAAAACCCACCCAC
>kmer_4
AAAAAAAAAAAAAAAAAAAAAACAGAGATGTA
>kmer_5
AAAAAAAAAAAAAAAAAAAAAACCCACCCACA
>kmer_6
AAAAAAAAAAAAAAAAAAAAAAGAAGAGAAAA
>kmer_7
AAAAAAAAAAAAAAAAAAAAACCCACCCACAT
>kmer_8
AAAAAAAAAAAAAAAAAAAAAGAAGAGAAAAA
>kmer_9
AAAAAAAAAAAAAAAAAAAAAGAGAACGACAC
>kmer_10
AAAAAAAAAAAAAAAAAAAACCCACCCACATG
ADD COMMENT
1
Entering edit mode

Your question speaks about a single file with tab-delimited content whereas your solution refers to 2 separate files. I think your question should have been clearer, because otherwise it is a waste of contributors' time. Please be more careful in the future.

ADD REPLY

Login before adding your answer.

Traffic: 2660 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6