sorting a csv file with 3 columns
2
0
Entering edit mode
3.6 years ago
dimitrischat ▴ 210

Hello all,

I am trying to sort a csv file containing 3 values. The first one is the one that matters. It contains many times the number 1, 2, 3, 4 etc etc. But each number doesnt appear the same number of times as the other number. So, i wanna sort the csv file from its 1st column, from less times of appearance of each number to the highest number of appearces.

for example

1
1
1
2
2
2
2
2
3
4
4

->

3
4
4
1
1
1
2
2
2
2

using mac but i can get access to a linux

next-gen • 1.7k views
ADD COMMENT
0
Entering edit mode

how is it related to bioinformatics ?

ADD REPLY
1
Entering edit mode

we could give him the chance that the numbers are autosomal chromosomes ;)

ADD REPLY
0
Entering edit mode

yes they are, i just changed the chr"x" to number for convenience.

ADD REPLY
2
Entering edit mode
3.6 years ago

There are multiple ways you could do it. Here's my suggestion:

perl -ne 'BEGIN { open IN, "input.csv" or die $!;
while (<IN>) { /^(\S+)/ and $d{$1}++ } close IN }
/^(\S+)/ and print "$d{$1}\t$_"' input.csv \
| sort -k1,1n -k2,2V | cut -f2-

The idea behind this code is to read the file once to count the number of times the first column appears, and to read the file twice to create a new first column with those previous counts. You just then need to sort numerically by that new first column and remove it at the end.

ADD COMMENT
0
Entering edit mode

is there any way to do this without perl? thanks a lot for your help!!

ADD REPLY
1
Entering edit mode

What's your problem with using Perl? (Not being rude or anything.)

ADD REPLY
1
Entering edit mode

I see. Would you find yourself more confortable by implementing that same rationale on Excel perhaps?

How to sort a column by occurrence count in Excel?

ADD REPLY
0
Entering edit mode

i ll try in excel. but the command above doesnt work for my file : http://www.mediafire.com/file/v0azlrb3sx9r5x7/example.csv/file

ADD REPLY
0
Entering edit mode

Do you have R available? If yes, I can offer you a small R script that'll do the sorting for you.

ADD REPLY
0
Entering edit mode

yea i do, i use R studio (if that helps)

ADD REPLY
1
Entering edit mode
3.6 years ago
Dunois ★ 2.5k

Here this script should help then. (Let me know if you can't access it.)

Looks like I've also reached my post limit for the next six hours, so if you need help with the script, just reply to this comment, and I'll edit my replies in here.

ADD COMMENT
1
Entering edit mode

yes, finally!!! worked! thank you!!

ADD REPLY

Login before adding your answer.

Traffic: 2223 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6