Sorting Chrs In [R]
3
6
Entering edit mode
10.8 years ago

I should start by saying I have solved this problem, but I feel like my code is ugly and overkill.

I am trying to sort a dataframe on a column containing human chromosomes:

chr1,chr2...chrY, chrX.

the problem is:

chr1, is followed by chr10.

What tricks do you use to deal with this problem IN the [R] environment?

r parsing sort • 8.8k views
20
Entering edit mode
10.8 years ago

the mixedsort is pretty good, but it will think ChrM comes before ChrX, then ChrY

if you have your own arbitrary order you should just use factors

> df<-data.frame("chr"=c("chr1","chrM","chr10","chr2","chrX","chr2"),"val"=c(1,2,3,4,5,6))
> df
chr val
1  chr1   1
2  chrM   2
3 chr10   3
4  chr2   4
5  chrX   5
6  chr2   6
> chrOrder<-c(paste("chr",1:22,sep=""),"chrX","chrY","chrM")
> df$chr<-factor(df$chr, levels=chrOrder)
> df$chr [1] chr1 chrM chr10 chr2 chrX chr2 Levels: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY chrM > df[order(df$chr),]
chr val
1  chr1   1
4  chr2   4
6  chr2   6
3 chr10   3
5  chrX   5
2  chrM   2

0
Entering edit mode

thank you for pointing this out.

0
Entering edit mode

+1 for being in base R, and in the spirit of the R language. Also, this solves @zev.kronenberg issue of sorting: df[order(df$chr, df$pos), ] works as expected. because the factor is sorted based on the underlying integer values. The levels argument to factor() is where the magic happens.

13
Entering edit mode
10.8 years ago
Gjain 5.7k

Hi Zev,

what you are looking for is: mixedsort {gtools}

Order or Sort strings with embedded numbers so that the numbers are in the correct order

package 'gtools' was built under R version 2.13.2

n<- c('chr1','chr21','chr13','chr4','chr10')
> n
[1] "chr1"  "chr21"   "chr13"   "chr4"   "chr10"

> mixedsort(n)
[1] "chr1"     "chr4"   "chr10"     "chr13"   "chr21"


I hope this helps.

1
Entering edit mode

Thanks! data[mixedorder(dat\$chromosomes),]

Too bad it wont two vectors. chromosome + position.

0
Entering edit mode

can you try this way: x<-c('chr2:1-4','chr1:10-15','chr10:2-5','chr5:4-8','chr21:6-23')

mixedsort(x) [1] "chr1:10-15" "chr2:1-4" "chr5:4-8" "chr10:2-5" "chr21:6-23"

1
Entering edit mode
10.8 years ago

You could substitute chr1 for chr01 (ie, zero-one), globally, but in that column.

0
Entering edit mode

That's not going to help with X, Y or M.

0
Entering edit mode

I use this solution frequently, and it does account for X and Y (but not M). Alphabetical sort yields: chr01, chr02, chr10, chr22, chrX, chrY.