Sorting Chrs In [R]
3
4
Entering edit mode
9.6 years ago

I should start by saying I have solved this problem, but I feel like my code is ugly and overkill.

I am trying to sort a dataframe on a column containing human chromosomes:

chr1,chr2...chrY, chrX.

the problem is:

chr1, is followed by chr10.

What tricks do you use to deal with this problem IN the [R] environment?

r parsing sort • 7.0k views
ADD COMMENT
18
Entering edit mode
9.6 years ago

the mixedsort is pretty good, but it will think ChrM comes before ChrX, then ChrY

if you have your own arbitrary order you should just use factors

> df<-data.frame("chr"=c("chr1","chrM","chr10","chr2","chrX","chr2"),"val"=c(1,2,3,4,5,6))
> df
    chr val
1  chr1   1
2  chrM   2
3 chr10   3
4  chr2   4
5  chrX   5
6  chr2   6
> chrOrder<-c(paste("chr",1:22,sep=""),"chrX","chrY","chrM")
> df$chr<-factor(df$chr, levels=chrOrder)
> df$chr
[1] chr1  chrM  chr10 chr2  chrX  chr2 
Levels: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY chrM
> df[order(df$chr),]
    chr val
1  chr1   1
4  chr2   4
6  chr2   6
3 chr10   3
5  chrX   5
2  chrM   2
ADD COMMENT
0
Entering edit mode

thank you for pointing this out.

ADD REPLY
0
Entering edit mode

+1 for being in base R, and in the spirit of the R language. Also, this solves @zev.kronenberg issue of sorting: df[order(df$chr, df$pos), ] works as expected. because the factor is sorted based on the underlying integer values. The levels argument to factor() is where the magic happens.

ADD REPLY
11
Entering edit mode
9.6 years ago
Gjain 5.6k

Hi Zev,

what you are looking for is: mixedsort {gtools}

Order or Sort strings with embedded numbers so that the numbers are in the correct order

package 'gtools' was built under R version 2.13.2

n<- c('chr1','chr21','chr13','chr4','chr10')
> n
[1] "chr1"  "chr21"   "chr13"   "chr4"   "chr10"

> mixedsort(n)
[1] "chr1"     "chr4"   "chr10"     "chr13"   "chr21"

I hope this helps.

ADD COMMENT
0
Entering edit mode

Thanks! data[mixedorder(dat$chromosomes),]

Too bad it wont two vectors. chromosome + position.

ADD REPLY
0
Entering edit mode

can you try this way: x<-c('chr2:1-4','chr1:10-15','chr10:2-5','chr5:4-8','chr21:6-23')

mixedsort(x) [1] "chr1:10-15" "chr2:1-4" "chr5:4-8" "chr10:2-5" "chr21:6-23"

ADD REPLY
1
Entering edit mode
9.6 years ago

You could substitute chr1 for chr01 (ie, zero-one), globally, but in that column.

ADD COMMENT
0
Entering edit mode

That's not going to help with X, Y or M.

ADD REPLY
0
Entering edit mode

I use this solution frequently, and it does account for X and Y (but not M). Alphabetical sort yields: chr01, chr02, chr10, chr22, chrX, chrY.

ADD REPLY

Login before adding your answer.

Traffic: 2168 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6