Question: Using Rstudio to bulk split CSV tabs into columns.
0
gravatar for pawhitesell
20 days ago by
pawhitesell30
pawhitesell30 wrote:

Hi,

I have output from blastn. The output format is 6. -outfmt "6 qseqid sseqid stitle pident qcovs" and I have saved it as csv file.

So I have multiple csv blast output files in a directory. I want to work on cleaning up the csv files all at once using R studio. Each csv file has the same number of columns (1 column) and a different number of rows.

To read all the csv files at once, i used:

fnames <- list.files()
csv <- lapply(fnames, read.csv)

This creates a list of data frames. Now I want to split the one column into multiple ones based on tab spaces. I tried to use:

strings <- str_split_fixed(csv$col1, " ", 5)

enter image description here

However, this is not working, as it creates no data at all:

enter image description here

Is there another way with which I could split the column on all the csv's all at once?

P.S. I am posting on behalf of someone, so they may respond here to any questions for clarification.

Thanks!

blastn csv R • 147 views
ADD COMMENTlink modified 20 days ago by pramach10 • written 20 days ago by pawhitesell30

Can you post some of your data using dput(head(csv[[1]]))?

ADD REPLYlink modified 20 days ago • written 20 days ago by rpolicastro1.9k

Thank you. Since I have differing number of rows this is not working. The error i get is

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 2652, 0
ADD REPLYlink modified 20 days ago by RamRS30k • written 20 days ago by pramach10

Add some example data to your post, and an example of the desired output.

ADD REPLYlink written 20 days ago by rpolicastro1.9k
"1932:@M05996:27:000000000-CFP53:1:1101:19611:2120  gb|AF144880|+|3541-3979|ARO:3002569|AAC(6')-Iy  gb|AF144880|+|3541-3979|ARO:3002569|AAC(6')-Iy [Salmonella enterica subsp. enterica serovar Enteritidis]    98.261  46"
"1932:@M05996:27:000000000-CFP53:1:1101:19611:2120  gb|AE006468.2|+|1707351-1707789|ARO:3002571|AAC(6')-Iaa gb|AE006468.2|+|1707351-1707789|ARO:3002571|AAC(6')-Iaa [Salmonella enterica subsp. enterica serovar Typhimurium str. LT2]  96.522  46"
"1932:@M05996:27:000000000-CFP53:1:1101:14997:2171  gb|AY769962|+|2434-5611|ARO:3000781|adeJ    gb|AY769962|+|2434-5611|ARO:3000781|adeJ [Acinetobacter baumannii]  87.273  22"
"1928:@M05996:27:000000000-CFP53:1:1101:15032:4757  gb|CP014358.1|-|2161325-2162750|ARO:3001327|mdtK    gb|CP014358.1|-|2161325-2162750|ARO:3001327|mdtK [Salmonella enterica subsp. enterica serovar Typhimurium]  98.387  100"

This is how the single column is on multiple csv file.

I would like to separate the columns based on tab space and make it into to 5 columns.

gb|CP014358.1|-|2161325-2162750|ARO:3001327|mdtK    gb|CP014358.1|-|2161325-2162750|ARO:3001327|mdtK [Salmonella enterica subsp. enterica serovar Typhimurium]  98.387  100
ADD REPLYlink modified 16 days ago by zx87549.7k • written 19 days ago by pramach10

I don't see a tab separator, but I do potentially see a pipe separator |.

When you refer to this as the single column in your files, are you saying that this is the only column in the data.frame after importing it, or are you rather saying that this is the one column out of many that you have that you want to split?

It's always better to post an example of your data using dput(head(csv[[1]])) to avoid confusion like this.

ADD REPLYlink written 19 days ago by rpolicastro1.9k

Even if I have to separate using a pipe separator, how would i do that in a list of files?

I have tried

strings <- str_split_fixed(csv$col1, "|", 8)

it doesnt work.

I am unable to attach an image or run dput(head(csv[[1]])). I apologize. My computer security settings is not letting me do it.

ADD REPLYlink written 19 days ago by pramach10

copy and paste the output from it into a comment. After, select all the code and then press the button with 0's and 1's just above the post to format it as code.

ADD REPLYlink written 19 days ago by rpolicastro1.9k

Thank you. Actually it worked.

strings <- str_split_fixed(csv$col1, " ", 5)

Even though the strings said "no data available", the actual csv list has separated into 5 columns. I apologize for not noticing this earlier.

ADD REPLYlink written 19 days ago by pramach10

Provide example data as plain text. Your links to images do not work.

ADD REPLYlink written 19 days ago by zx87549.7k
0
gravatar for zx8754
20 days ago by
zx87549.7k
London
zx87549.7k wrote:

Row bind list of csvs into one dataframe then we can work on the column.

csv <- do.call(rbind, lapply(fnames, read.csv))

Or if you wish to keep them as list:

csv <- lapply(fnames, function(i) {
  d <- read.csv(i)
  x <- str_split_fixed(d$col1, " ", 5)
  #return
  cbind(d, x)
})
ADD COMMENTlink modified 20 days ago • written 20 days ago by zx87549.7k

Thank you. Since I have differing number of rows this is not working. The error i get is

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 2652, 0
ADD REPLYlink modified 20 days ago by RamRS30k • written 20 days ago by pramach10

If I use

csv <- do.call(rbind, lapply(fnames, read.csv))

This is the error i get.

Error in match.names(clabs, names(xi)) : 
  names do not match previous names
ADD REPLYlink modified 20 days ago by RamRS30k • written 20 days ago by pramach10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1521 users visited in the last hour