Question: Sorting Chrs In [R]
4
gravatar for Zev.Kronenberg
9.1 years ago by
United States
Zev.Kronenberg11k wrote:

I should start by saying I have solved this problem, but I feel like my code is ugly and overkill.

I am trying to sort a dataframe on a column containing human chromosomes:

chr1,chr2...chrY, chrX.

the problem is:

chr1, is followed by chr10.

What tricks do you use to deal with this problem IN the [R] environment?

R parsing sort • 6.4k views
ADD COMMENTlink written 9.1 years ago by Zev.Kronenberg11k
17
gravatar for Jeremy Leipzig
9.1 years ago by
Philadelphia, PA
Jeremy Leipzig19k wrote:

the mixedsort is pretty good, but it will think ChrM comes before ChrX, then ChrY

if you have your own arbitrary order you should just use factors

> df<-data.frame("chr"=c("chr1","chrM","chr10","chr2","chrX","chr2"),"val"=c(1,2,3,4,5,6))
> df
    chr val
1  chr1   1
2  chrM   2
3 chr10   3
4  chr2   4
5  chrX   5
6  chr2   6
> chrOrder<-c(paste("chr",1:22,sep=""),"chrX","chrY","chrM")
> df$chr<-factor(df$chr, levels=chrOrder)
> df$chr
[1] chr1  chrM  chr10 chr2  chrX  chr2 
Levels: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY chrM
> df[order(df$chr),]
    chr val
1  chr1   1
4  chr2   4
6  chr2   6
3 chr10   3
5  chrX   5
2  chrM   2
ADD COMMENTlink written 9.1 years ago by Jeremy Leipzig19k

thank you for pointing this out.

ADD REPLYlink written 9.1 years ago by Gjain5.6k

+1 for being in base R, and in the spirit of the R language. Also, this solves @zev.kronenberg issue of sorting: df[order(df$chr, df$pos), ] works as expected. because the factor is sorted based on the underlying integer values. The levels argument to factor() is where the magic happens.

ADD REPLYlink written 8.9 years ago by bdemarest460
11
gravatar for Gjain
9.1 years ago by
Gjain5.6k
Bengaluru, India
Gjain5.6k wrote:

Hi Zev,

what you are looking for is: mixedsort {gtools}

Order or Sort strings with embedded numbers so that the numbers are in the correct order

package 'gtools' was built under R version 2.13.2

n<- c('chr1','chr21','chr13','chr4','chr10')
> n
[1] "chr1"  "chr21"   "chr13"   "chr4"   "chr10"

> mixedsort(n)
[1] "chr1"     "chr4"   "chr10"     "chr13"   "chr21"

I hope this helps.

ADD COMMENTlink modified 9.1 years ago • written 9.1 years ago by Gjain5.6k

Thanks! data[mixedorder(dat$chromosomes),]

Too bad it wont two vectors. chromosome + position.

ADD REPLYlink written 9.1 years ago by Zev.Kronenberg11k

can you try this way: x<-c('chr2:1-4','chr1:10-15','chr10:2-5','chr5:4-8','chr21:6-23')

mixedsort(x) [1] "chr1:10-15" "chr2:1-4" "chr5:4-8" "chr10:2-5" "chr21:6-23"

ADD REPLYlink written 9.1 years ago by Gjain5.6k
1
gravatar for Larry_Parnell
9.1 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

You could substitute chr1 for chr01 (ie, zero-one), globally, but in that column.

ADD COMMENTlink written 9.1 years ago by Larry_Parnell16k

That's not going to help with X, Y or M.

ADD REPLYlink written 9.1 years ago by Neilfws49k

I use this solution frequently, and it does account for X and Y (but not M). Alphabetical sort yields: chr01, chr02, chr10, chr22, chrX, chrY.

ADD REPLYlink written 8.9 years ago by bdemarest460
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1603 users visited in the last hour
_