Value extraction from large file with bash
1
1
Entering edit mode
4.3 years ago
mel22 ▴ 100

Hi I have many large files with association results. Each file containes 8 columns (the 3rd one is the p value), I need to create from each file a new one conataining only observations where the p value is < 10 e-5. How can I do this with bash code ? Here a small example from these files :

         SNP      N         P        p2        or1     or2    q        q1           
 c10_pos5974849   2      0.1881      0.1881  1.1931  1.1931  0.5707    0.00
 c10_pos5975482   2      0.3225      0.3225  0.8670  0.8670  0.8840    0.00
 c11_pos68438345   2      0.6537        0.66  0.9705  0.9690  0.2856   12.29
 c11_pos107693921   2      0.8938      0.8558  1.0133  1.0250  0.1755   45.52
 c12_pos67499221   2      0.8351      0.8351  1.0236  1.0236  0.6413    0.00
 c14_pos67844869   2      0.1103      0.1915  0.7334  0.7229  0.2039   38.05
 c14_pos68073026   2     0.09954      0.1298  0.6383  0.6215  0.2662   19.11
 c14_pos68087872   2      0.3704      0.3704  1.2500  1.2500  0.7319    0.00

Thank you

SNP • 628 views
ADD COMMENT
1
Entering edit mode

one word : awk ! ;)

in general: if you're working with column like data, always consider awk for processing it

ADD REPLY
1
Entering edit mode

with gnu-parallel and awk:

$ parallel --dry-run  "awk -F \"\t\" 'NR==1 {print}; \$3<=10^-5 {print}' {} > out/{.}.filter.txt" ::: *.txt

create a folder by name "out" in the current folder and run the script in the current folder. Remove dry-run to execute the command.

ADD REPLY
0
Entering edit mode

That's great thanks cpad0112

ADD REPLY
6
Entering edit mode
4.3 years ago
find . -type f -name "*.common.suffix"  | while read F ; do  awk '($5=="P"  || $5 < 1E-5)' $F > ${F}.subset.txt ; done
ADD COMMENT

Login before adding your answer.

Traffic: 2811 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6