Question: R "For I In" Plot In Sorted Order
0
gravatar for PoGibas
7.0 years ago by
PoGibas4.8k
Vilnius
PoGibas4.8k wrote:

My files (>100) are:

AT.BP.50.txt
AT.BP.200.txt
AT.BP.500.txt 
SP.BP.50.txt
SP.BP.200.txt 
SP.BP.500.txt 
....

I want to plot them with R.

Usually I do it by this:

files <- list.files()
par(mfrow=c(3,3))
for (i in 1:length(files)) {
b <- read.table(files[i])
barplot(table(b), main=files[i])
....

But R plots them in such order:

"AT.BP.200.txt" "AT.BP.500.txt" "AT.BP.50.txt"

"SP.BP.200.txt" "SP.BP.500.txt" "SP.BP.50.txt"

........

And I want them to be plotted in sorted order:

"AT.BP.50.txt" "AT.BP.200.txt" "AT.BP.500.txt"

"SP.BP.50.txt" "SP.BP.200.txt" "SP.BP.500.txt"

........

How can I do that?

plot • 2.0k views
ADD COMMENTlink modified 4.6 years ago by Biostar ♦♦ 20 • written 7.0 years ago by PoGibas4.8k
3

What is the point of closing a question if it already has an answer? I sort files and stuff all the time when I'm doing bioinformatics, and I'm rarely confident I'm doing it the best way. I depend on serendipitous information like this to stumble on better methods. Given that there's already an answer, closing the question simply precludes the possibility of a better answer. How does that make life better?

ADD REPLYlink written 7.0 years ago by seidel6.8k

I think the original poster would actually get the best answer on stack overflow, there's a lot of R gurus over there. I recommend any people here interested in R questions like this follow the rss feed for the R tag from stack overflow. Biostars is supposed to be limited to bioinformatics specific questions. Just trying to keep the internet organized.

ADD REPLYlink written 7.0 years ago by Madelaine Gogol5.0k
3

"Biostars is supposed to be limited to bioinformatics specific questions." I guess that explains why a question about decreasing sequencing costs remains open and gets 29 votes. :) I agree that this question could have been geared more explicitly towards bioinformatics with perhaps two words "My sequence score files..." But consider that a programmer doing bioinformatics might see this as simply a sorting question to be asked in another expert forum, whereas a biologist trying to do bioinformatics thinks that engaging in programming in a biological context is doing bioinformatics and would ask it where they see people doing bioinformatics, and people like me find it related, and useful, and welcome, even thought the word "bioinformatics" was not used in the question. Just trying to keep the internet open and collegial.

ADD REPLYlink written 7.0 years ago by seidel6.8k
1

Okay, go for it.

ADD REPLYlink written 7.0 years ago by Madelaine Gogol5.0k

I would post this on stackoverflow.com, it's not exactly bioinformatics-centric.

ADD REPLYlink written 7.0 years ago by Madelaine Gogol5.0k
3
gravatar for Madelaine Gogol
7.0 years ago by
Madelaine Gogol5.0k
Kansas City
Madelaine Gogol5.0k wrote:

Here's my somewhat hideous attempt, but in the event that you had too many to want to use grep...

files<-c("AT.BP.200.txt", "AT.BP.50.txt", "AT.BP.500.txt", "SP.BP.200.txt", "SP.BP.50.txt", "SP.BP.500.txt", "SP.BP.80.txt")

filename_parts<-data.frame(t(sapply(strsplit(files,"\\."),function(x){x[c(1,3)]})),stringsAsFactors=F)
filename_parts[,2]<-as.numeric(filename_parts[,2])
colnames(filename_parts)<-c("a","b")

ord.iv<-with(filename_parts,order(a,b))

files.reorder<-files[ord.iv]

for(i in 1:length(files.reorder))
{
    b <- read.table(files.reorder[i])
    ...
}
ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by Madelaine Gogol5.0k

In practice, I usually create a file that lists file names and aliases in whatever order I want and then read that in.

ADD REPLYlink written 7.0 years ago by Madelaine Gogol5.0k
1
gravatar for Julien Textoris
7.0 years ago by
Marseille, France
Julien Textoris430 wrote:

There may be a better way, but what about using something like that ? :

ind.50 = grep("50.txt",files)
ind.200 = grep("200.txt",files)
ind.500 = grep("500.txt",files)

for(i in 1:length(files)/3) {
  for(j in c(ind.50[i], ind.200[i], ind.500[i])) {
    b = read.table(files[j])
    ...
  }
}
ADD COMMENTlink written 7.0 years ago by Julien Textoris430
1
gravatar for bdemarest
7.0 years ago by
bdemarest460
Salt Lake City, UT, USA
bdemarest460 wrote:

Your file names are not sorted in the order you want, because list.files() returns an alphabetically sorted list.

# Reproducible code showing the problem.

# files = list.files() will return file names in alphabetical order:
files = c("AT.BP.200.txt", "AT.BP.50.txt", "AT.BP.500.txt",
          "SP.BP.200.txt", "SP.BP.50.txt", "SP.BP.500.txt")

par(mfrow=c(3, 3))

for (fname in files) {
    plot(1:10, main=fname)
}

# Proposed solution: Rename files so that alphabetical sorting works.

sorted_files = sort(gsub("\\.(\\d{2})\\.", ".0\\1.", files))
sorted_files
# [1] "AT.BP.050.txt" "AT.BP.200.txt" "AT.BP.500.txt"
# [4] "SP.BP.050.txt" "SP.BP.200.txt" "SP.BP.500.txt"

par(mfrow=c(3, 3));

for (fname in sorted_files) {
    plot(1:10, main=fname)
}
ADD COMMENTlink written 7.0 years ago by bdemarest460
1
gravatar for Sukhdeep Singh
7.0 years ago by
Sukhdeep Singh9.7k
Netherlands
Sukhdeep Singh9.7k wrote:

Let me tell you about a very clean tool called mixedsort, part of package gtools which does alphanumeric sorting. So, to comply your data with it, we have to replace periods and then get an order. By this, In just a single command you can achieve what you need.

files<-c("AT.BP.200.txt", "AT.BP.50.txt", "AT.BP.500.txt", "SP.BP.200.txt", "SP.BP.50.txt", "SP.BP.500.txt", "SP.BP.80.txt")
wanted -> files[mixedorder(gsub('[.]','',files))]

So, its done. I removed the dots using gsub, inputted the output to mixedsort tool to get order and accessed the files in that order. Go ahead and plot it now.

Cheers

ADD COMMENTlink written 7.0 years ago by Sukhdeep Singh9.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1616 users visited in the last hour