Question

fetch particular columns from csv files

0

Entering edit mode

7.5 years ago

anithanagaraj93 ▴ 10

i have file of 10000 rows, consist some values like below.. i need to fetch the columns only consist of rs, value(0-9), but the problem is rs is not always in first column, column will be differ. so how solve this problem. please help me out

rs1 0.5(0.3-1.5) 0.08
rs2 1.2(1.5-5.0) 0.05
snp1 rs6 12 54 1.8(0.8-8.1) 0.02
ght gt rs7 1.2(1.0-2.8) 0.04

Thank you, Anitha

perl python • 2.4k views

ADD COMMENT • link 7.4 years ago by anithanagaraj93 ▴ 10

1

Entering edit mode

what have you tried ?

ADD REPLY • link 7.5 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thank you for your patience.

I tried with this regex format to match column that has value(value), "\d.\d+[([]\d.\d+\–\d.\d+[)]]". From this query its fetching whole rows, but i need a columns that only match with my query.

ADD REPLY • link 7.5 years ago by anithanagaraj93 ▴ 10

0

Entering edit mode

You should check out grep's options. -o is what you need here. Also, please double check your regex.

ADD REPLY • link 7.5 years ago by Ram 43k

1

Entering edit mode

To me it's not clear what you want to obtain, could you (based on your small example) post the desired result?

& (as Pierre asks) it is good practice to show what you tried and the effort you put into it when asking questions on online fora

ADD REPLY • link 7.5 years ago by WouterDeCoster 47k

0

Entering edit mode

You say

rs is not always in first column

and number of 'columns' in a row may not be the same. Then it is not a column data. The data format is not optimal. Try to change that if possible.

If the data can only be like that, checkout regular expression and use it.

ADD REPLY • link 7.5 years ago by biocyberman ▴ 860

Ram · Answer 1 · 2016-10-05

3

Entering edit mode

7.5 years ago

Persistent LABS ▴ 750

Suppose your data is stored in a file called temp.txt Use grep to fetch the column with rs entries

grep -oP 'rs\d+' temp.txt

where

-P : using perl regex where \d corresponds to digits.
-o: returns only the matched string.

Hope this solves your problem.

Thanks, Persistent LABS, SanGeniX

ADD COMMENT • link updated 7.5 years ago by Ram 43k • written 7.5 years ago by Persistent LABS ▴ 750

0

Entering edit mode

I'm removing the link to your company. Please do not use posts to advertise your service. If you would like to do that, I'd recommend creating a new thread instead, and the mods and admins can decide if it's of value to the site. Thank you.

ADD REPLY • link 7.5 years ago by Ram 43k

1

Entering edit mode

Thank you Ram :)

ADD REPLY • link 7.5 years ago by Persistent LABS ▴ 750

0

Entering edit mode

This answer does not in fact ensure that the values matched are the complete contents of a "column"; however, I expect that it _probably_ achieves the intent of the original poster, so, +1 from me.

ADD REPLY • link 7.5 years ago by Malcolm.Cook ★ 1.5k

1

Entering edit mode

Word/delimiter boundaries. That's what this needs.

ADD REPLY • link 7.5 years ago by Ram 43k

1

Entering edit mode

Agreed. It is not a perfect solution, but it will solve what he expects :) Thanks

ADD REPLY • link 7.5 years ago by Persistent LABS ▴ 750

score 2 · Answer 2 · 2016-10-04

+1 to Pierre and Wouter. I'd recommend grep coupled with word boundaries or separator-based boundaries in your case. You have delimited data that is not necessarily structured, and the one thing you have going for you is the consistent format of dbSNP rs numbers. I'd recommend cleaning the dataset if you wish to get anything more out of it.

score 1 · Answer 3 · 2016-10-04

1

Entering edit mode

7.5 years ago

Eric T. ★ 2.8k

If the values of interest are not all in the same column, you don't need a tool for extracting individual columns (e.g. the Unix command cut). Instead, you need to search every line for matching strings. You can use the Unix commands grep and sed for this, and/or awk if you're familiar with it.

ADD COMMENT • link 7.5 years ago by Eric T. ★ 2.8k

score 0 · Answer 4 · 2016-10-04

0

Entering edit mode

7.5 years ago

natasha.sernova ★ 4.0k

Try this link:

csv-columns

or this one if you use Python:

the_same_with_Python

ADD COMMENT • link 7.5 years ago by natasha.sernova ★ 4.0k

0

Entering edit mode

awk might work in this case, but I never recommend it for parsing CSV to people. It is insensitive to quoted separators, which means you'll end up with an inconsistent number of fields across rows. The csv modules in python/R are the best options for CSV parsing in general.

ADD REPLY • link 7.5 years ago by Ram 43k

score 0 · Answer 5 · 2016-11-14

0

Entering edit mode

7.4 years ago

anithanagaraj93 ▴ 10

Thank you for your answers

ADD COMMENT • link 7.4 years ago by anithanagaraj93 ▴ 10