Question

How to retrieve rows from OTU table

0

Entering edit mode

6.5 years ago

mollysil ▴ 40

I have a text file that is a list of OTU names in the first column, with the occurrence in each treatment in the following columns (totaling 34 columns). I put a sample of the table below. There are ~3000 OTUs total in this file (therefore, ~3000 rows).

CM2_9   0   0   0

AF141_14    22  25  23

AF171_13    13  0   0

LIPB162_1   0   0   0

I have a separate text file with all the OTU names of interest (~500 OTUs), which looks something like this:

WSF3_2

WSF1_2

AF172_15

IO2_57

Is there a simple way to retrieve just the rows in my table that match up to the OTUs of interest? I want, as output, a new table with just the rows of my OTUs of interest. Help please! I'm working in PUTTY (linux). Also, does anything need to be changed to a comma delimited file? Both files are tab delimited as a .txt file.

OTU Rows • 2.0k views

ADD COMMENT • link updated 6.5 years ago by 5heikki 11k • written 6.5 years ago by mollysil ▴ 40

1

Entering edit mode

Take a look at join command in unix if you do not want to use external programs.

ADD REPLY • link 6.5 years ago by GenoMax 141k

0

Entering edit mode

Could you try the following solution:

$ grep -f test2.txt test1.txt

test2.txt contains all the OTU names of interest (~500 OTUs) and test1.txt is complete OTU file (~3000 OTUs)

Input:

$ cat test1.txt 
CM2_9   0   0   0
AF141_14    22  25  23
AF171_13    13  0   0
LIPB162_1   0   0   0


$ cat test2.txt 
LIPB162_1
CM2_9

output:

$ grep -f test2.txt test1.txt 
CM2_9   0   0   0
LIPB162_1   0   0   0

ADD REPLY • link 6.5 years ago by cpad0112 21k

score 2 · Answer 1 · 2017-10-31

2

Entering edit mode

6.5 years ago

5heikki 11k

Assuming tab separated files

join -1 1 -2 1 -t $'\t' <(sort -t $'\t' -k1,1 otutable) <(sort -t $'\t' -k1,1 listfile)

ADD COMMENT • link 6.5 years ago by 5heikki 11k

score 1 · Answer 2 · 2017-10-31

1

Entering edit mode

6.5 years ago

st.ph.n ★ 2.7k

Here's a quick python solution, where ids.txt are the OTUs of interest, and otus.txt is your original file.

#!/usr/bin/env python

with open('ids.txt', 'r') as f:
    ids = [line.strip() for line in f]

with open('otus.txt', 'r') as f2:
    otu = {}
    for line in f2:
        otu[line.strip().split('\t')[0]] = line.strip().split('\t')

for i in ids:
    print '\t'.join(otu[i])

Save as get_otus.py, run as python get_otus.py > my_otus.txt

ADD COMMENT • link 6.5 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

Magical! Thanks so much!!!

ADD REPLY • link 6.5 years ago by mollysil ▴ 40