Question

Extract each column from a txt file and save into separate files, with the first line as file names

0

Entering edit mode

4.2 years ago

Cecelia ▴ 30

I have a dataframe that looks like this:

CHROM   POS p3_150  p3_166  p3_128 ......
scaffold1   834 4,1 5,7 6,0
scaffold1   950 5,0 11,0 5,0

What I wish to do is to extract every column starts from 3rd column and save them to separate files. The output files looks like:

p3_150.txt

p3_150
4,1
5,0

p3_166.txt

p3_166
5,7
11,0

p3_128.txt

p3_128
6,0
5,0

Is there a simple way to do the task without manually cut -f each column and save to files? Any suggestion will be very appreciated.

awk shell • 1.3k views

ADD COMMENT • link updated 4.2 years ago by Arup Ghosh 3.2k • written 4.2 years ago by Cecelia ▴ 30

0

Entering edit mode

I have a dataframe

As in R? Or you are just referring to a data matrix that way.

ADD REPLY • link 4.2 years ago by GenoMax 141k

2

Entering edit mode

4.2 years ago

Pierre Lindenbaum 161k

not tested

awk -F '\t' '(NR==1) {split($0,header);next;} {for(i=3;i<=NF;i++) {f=sprintf("out.%s.txt",header[i]); print $i >> f;}}' input.tsv

ADD COMMENT • link 4.2 years ago by Pierre Lindenbaum 161k

2

Entering edit mode

4.2 years ago

Arup Ghosh 3.2k

You can try this Python script.

#!/usr/bin/python3
import sys,csv
fname = ''
fdata = ()
df = open(sys.argv[1],'r')
dfr = csv.reader(df, delimiter='\t')
dfcol = zip(*dfr)
for i in range(2,len(dfcol)):
    fname = str(dfcol[i][0])+'.txt'
    fdata = dfcol[i]
    print('Writing '+fname)
    fw = open(fname, 'w')
    fw.write('\n'.join(fdata))
    fw.close()
df.close()

test:

python script.py input.tsv

The script will only work for tab-delimited files, to make it work with csv files change csv.reader delimiter.

ADD COMMENT • link 4.2 years ago by Arup Ghosh 3.2k

score 4 · Accepted Answer · 2020-03-02

4

Entering edit mode

4.2 years ago

venu 7.1k

Since you mentioned cut, here is one way, if you know the number of columns to extract

seq 3 10 | while read -r LINENUM; do c_dat=$(cat file.txt | cut -f "$LINENUM"); c_name=$(cat file.txt | cut -f "$LINENUM" | head -1); echo "$c_dat" > "$c_name".txt; done

ADD COMMENT • link 4.2 years ago by venu 7.1k

0

Entering edit mode

Worked! Many thanks!

ADD REPLY • link 4.2 years ago by Cecelia ▴ 30

0

Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLY • link 4.2 years ago by GenoMax 141k