Question: paste columns into one line in command line
0
gravatar for newbiebio
3.9 years ago by
newbiebio80
newbiebio80 wrote:

I have a txt files(very huge one) about genes, the txt file contains gene names, gene entrez_id, chrom,start, end and sequence columns. I want to put gene names, gene entrez_id, chr, start and end columns into one, and sequence will be in a new line. Basically, I want to convert txt file to fasta format with Linux command. ex: > SAMD11_148398_chr1_879534_879961 GGTTGC

I tried to use online converter, but my file is so huge, so it will be good to use command line to convert.

linux • 960 views
ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by newbiebio80

Did you try anything? If yes, please show us, people here try to correct your code. If not, try awk.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by venu6.7k

I tried some online tools, but it failed. I was thinking use awk. And Pierre's just gave me exactly what I wanted.

ADD REPLYlink written 3.9 years ago by newbiebio80

I would suggest you to write an example of input and output lines. Just to let the people figure out more easily what are you expecting :)

ADD REPLYlink written 3.9 years ago by iraun3.8k

Thanks to both of you.very efficient

ADD REPLYlink written 3.9 years ago by newbiebio80

Please use ADD REPLY/ADD COMMENT when responding to existing posts to keep threads logically organized.

Remember to accept one (or more) answers as correct (use the check-mark symbol against the answer).

ADD REPLYlink written 3.9 years ago by genomax91k
3
gravatar for Pierre Lindenbaum
3.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:
awk '{printf(">%s_%s_%s_%s_%s\n%s\n",$1,$2,$3,$4,$5,$6);' input.txt > out.txt
ADD COMMENTlink written 3.9 years ago by Pierre Lindenbaum131k

Thank you, Pierre. I used the code. It works.

ADD REPLYlink written 3.9 years ago by newbiebio80

Pierre, isn't it required to close } ?

Wondering how it worked for OP without closing the {.

ADD REPLYlink written 3.9 years ago by venu6.7k
1
gravatar for Tao
3.9 years ago by
Tao410
Tao410 wrote:

If your input format is like ">SAMD11_148398_chr1_879534_879961 GGTTGC", then use the following command:

awk '{print $1"\n"$2}' input_file > output_file

But if your input is like "SAMD11 148398 chr1 879534 879961 GGTTGC" and you want to convert it to two lines:

>SAMD11_148398_chr1_879534_879961
GGTTGC

Use Pierre's answer!

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Tao410

I used Pierre's answer. It works. Thank you also.

ADD REPLYlink written 3.9 years ago by newbiebio80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2281 users visited in the last hour