Question: How can I extract the rows which are numbered of a big file?
0
gravatar for hosein_salehi6
2.0 years ago by
hosein_salehi60 wrote:

I have a big file like this ( (included 6 columns):

Name                                   Chr                Position                GType   LRR      BAF
250506CS3900140500001_312.1  23 26298017    BB  0.004256991    -0.0254199                      1
250506CS3900176800001_906.1  7  81648528    BB  0.05812091  0.996112
250506CS3900211600001_1041.1 16 41355381    BB     -0.1070691   0.9926475
250506CS3900218700001_1294.1    2   148802744   BB      -0.06002647 0.9837347
250506CS3900283200001_442.1 1   62646307    AB  0.0280207   0.4966125
250506CS3900371000001_1255.1    11  35339124    BB  0.05070077  1
250506CS3900386000001_696.1 16  62646307    AB  0.0280207   0.4966125
250506CS3900487100001_1521.1    14  1110363         AB  0.0893564   0.5164082
250506CS3901300500001_1084.1    7   89431547    BB  0.008588651 1
OAR3_7444330.1                  3   26298017    BB  0.004256991    -0.0254199     
OAR3_74471615.1                 3   41355381    BB     -0.1070691   0.9926475
OAR3_74485418_X.1           5       1110363         AB  0.0893564   0.5164082
OAR3_74546684.1                 3   89431547    BB  0.008588651 1
OAR3_74587791.1                 3   26298017    BB  0.004256991    -0.0254199 
OAR3_74604120.1                 3   62646307    AB  0.0280207   0.4966125
OAR3_74642696.1                 3   62646307    AB  0.0280207   0.4966125
OAR3_74703774.1                 3   148802744   BB      -0.06002647 0.9837347
OAR3_74732440.1                 3   81648528    BB  0.05812091  0.996112

also I have list file like this (included one column):

250506CS3900283200001_442.1
250506CS3900386000001_696.1
250506CS3900371000001_1255.1
250506CS3900487100001_1521.1
OAR3_74546684.1
OAR3_74604120.1 
OAR3_74703774.1

How can I extract the rows which are numbered in list file above? . Please help me. I'd be really grateful if I can commands or ...

genome • 564 views
ADD COMMENTlink modified 2.0 years ago by CS10 • written 2.0 years ago by hosein_salehi60
2
gravatar for Pierre Lindenbaum
2.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum127k wrote:

this is basic linux: https://linux.die.net/man/1/join

join -t $'\t' -1 1 -2 1 <(sort   -t $'\t' -k1,1 file1.txt) <(sort   -t $'\t' -k1,1 file2.txt)
ADD COMMENTlink written 2.0 years ago by Pierre Lindenbaum127k

Yes I did many try, like this: join -t $'\t' -1 1 -2 1 <(sort -t $'\t' -k1,1 440s.txt) <(sort -t $'\t' -k1,1 440.txt)> Output . But output file is empty

ADD REPLYlink written 24 months ago by hosein_salehi60

you're doing something wrong, or you're not using bash, or the delimiter is not a tabulation

ADD REPLYlink written 24 months ago by Pierre Lindenbaum127k
1
gravatar for CS
2.0 years ago by
CS10
United Kingdom
CS10 wrote:

you can try sorting your files on first column and do

grep -f smallFile BigFile2 > output.txt
ADD COMMENTlink written 2.0 years ago by CS10
1

why would you need to sort ? what would happen if they key is present in another column ?

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Pierre Lindenbaum127k

Sorting can speed up things:

time grep -f sorted_T1 ../BWA/ERR1094807.sam > Output

real 1m12.685s user 0m6.970s sys 0m7.441s

time grep -f unsorted_T1 ../BWA/ERR1094807.sam > Output

real 1m16.928s user 0m7.176s sys 0m7.914s

And yes, you are right if the key is present in any other column grep -f would pick up that line. I thought it was not the case in this example.

ADD REPLYlink written 2.0 years ago by CS10

Thank you very much for your attention I'm working by shell . Actully my system does not respond by this method and each file has too much rows ( about 600K) . and I have 500 files such as first file( with 6 column and 600 rows). This commands take a lot of time from me , so do you have another suggestion?

ADD REPLYlink written 24 months ago by hosein_salehi60

did you try the join method ?

ADD REPLYlink written 24 months ago by Pierre Lindenbaum127k

Yes I did , Actually after that I have a empty file

  join -t $'\t' -1 1 -2 1 <(sort   -t $'\t' -k1,1 440s.txt) <(sort   -t $'\t' -k1,1 440.txt)> Output

 440s.txt: is file 1 (big file with 6 column)

 440.txt: is file 2(small file with 1 column)
 So output is empty
ADD REPLYlink modified 24 months ago • written 24 months ago by hosein_salehi60

If you are concerned about speed, you should add -F.

-F
--fixed-strings
Interpret the pattern as a list of fixed strings (instead of regular expressions), separated by newlines, any of which is to be matched.
ADD REPLYlink written 24 months ago by igor9.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1219 users visited in the last hour