Question

How can I extract the rows which are numbered of a big file?

0

Entering edit mode

6.1 years ago

hosin • 0

I have a big file like this ( (included 6 columns):

Name                                   Chr                Position                GType   LRR      BAF
250506CS3900140500001_312.1  23 26298017    BB  0.004256991    -0.0254199                      1
250506CS3900176800001_906.1  7  81648528    BB  0.05812091  0.996112
250506CS3900211600001_1041.1 16 41355381    BB     -0.1070691   0.9926475
250506CS3900218700001_1294.1    2   148802744   BB      -0.06002647 0.9837347
250506CS3900283200001_442.1 1   62646307    AB  0.0280207   0.4966125
250506CS3900371000001_1255.1    11  35339124    BB  0.05070077  1
250506CS3900386000001_696.1 16  62646307    AB  0.0280207   0.4966125
250506CS3900487100001_1521.1    14  1110363         AB  0.0893564   0.5164082
250506CS3901300500001_1084.1    7   89431547    BB  0.008588651 1
OAR3_7444330.1                  3   26298017    BB  0.004256991    -0.0254199     
OAR3_74471615.1                 3   41355381    BB     -0.1070691   0.9926475
OAR3_74485418_X.1           5       1110363         AB  0.0893564   0.5164082
OAR3_74546684.1                 3   89431547    BB  0.008588651 1
OAR3_74587791.1                 3   26298017    BB  0.004256991    -0.0254199 
OAR3_74604120.1                 3   62646307    AB  0.0280207   0.4966125
OAR3_74642696.1                 3   62646307    AB  0.0280207   0.4966125
OAR3_74703774.1                 3   148802744   BB      -0.06002647 0.9837347
OAR3_74732440.1                 3   81648528    BB  0.05812091  0.996112

also I have list file like this (included one column):

250506CS3900283200001_442.1
250506CS3900386000001_696.1
250506CS3900371000001_1255.1
250506CS3900487100001_1521.1
OAR3_74546684.1
OAR3_74604120.1 
OAR3_74703774.1

How can I extract the rows which are numbered in list file above? . Please help me. I'd be really grateful if I can commands or ...

genome • 1.7k views

ADD COMMENT • link updated 6.1 years ago by CS ▴ 10 • written 6.1 years ago by hosin • 0

score 2 · Answer 1 · 2018-04-04

2

Entering edit mode

6.1 years ago

Pierre Lindenbaum 161k

this is basic linux: https://linux.die.net/man/1/join

join -t $'\t' -1 1 -2 1 <(sort   -t $'\t' -k1,1 file1.txt) <(sort   -t $'\t' -k1,1 file2.txt)

ADD COMMENT • link 6.1 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Yes I did many try, like this: join -t $'\t' -1 1 -2 1 <(sort -t $'\t' -k1,1 440s.txt) <(sort -t $'\t' -k1,1 440.txt)> Output . But output file is empty

ADD REPLY • link 6.1 years ago by hosin • 0

0

Entering edit mode

you're doing something wrong, or you're not using bash, or the delimiter is not a tabulation

ADD REPLY • link 6.1 years ago by Pierre Lindenbaum 161k

score 1 · Answer 2 · 2018-04-04

1

Entering edit mode

6.1 years ago

CS ▴ 10

you can try sorting your files on first column and do

grep -f smallFile BigFile2 > output.txt

ADD COMMENT • link 6.1 years ago by CS ▴ 10

1

Entering edit mode

why would you need to sort ? what would happen if they key is present in another column ?

ADD REPLY • link 6.1 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Sorting can speed up things:

time grep -f sorted_T1 ../BWA/ERR1094807.sam > Output

real 1m12.685s user 0m6.970s sys 0m7.441s

time grep -f unsorted_T1 ../BWA/ERR1094807.sam > Output

real 1m16.928s user 0m7.176s sys 0m7.914s

And yes, you are right if the key is present in any other column grep -f would pick up that line. I thought it was not the case in this example.

ADD REPLY • link 6.1 years ago by CS ▴ 10

0

Entering edit mode

Thank you very much for your attention I'm working by shell . Actully my system does not respond by this method and each file has too much rows ( about 600K) . and I have 500 files such as first file( with 6 column and 600 rows). This commands take a lot of time from me , so do you have another suggestion?

ADD REPLY • link 6.1 years ago by hosin • 0

0

Entering edit mode

did you try the join method ?

ADD REPLY • link 6.1 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Yes I did , Actually after that I have a empty file

  join -t $'\t' -1 1 -2 1 <(sort   -t $'\t' -k1,1 440s.txt) <(sort   -t $'\t' -k1,1 440.txt)> Output

 440s.txt: is file 1 (big file with 6 column)

 440.txt: is file 2(small file with 1 column)
 So output is empty

ADD REPLY • link 6.1 years ago by hosin • 0

0

Entering edit mode

If you are concerned about speed, you should add -F.

-F
--fixed-strings
Interpret the pattern as a list of fixed strings (instead of regular expressions), separated by newlines, any of which is to be matched.

ADD REPLY • link 6.1 years ago by igor 13k