Entering edit mode
3.6 years ago
pramach1
▴
40
library(purrr)
library(tidyverse)
fnames <- list.files() (all csv files are loaded onto a list here).
myfiles = lapply(fnames, read.delim)
strings <- str_split_fixed(myfiles$col1, " ", 5) (all the files are separating the column1 here based on tab space)
colnames <- c("qseqid", "sseqid", "stitle", "pident", "qcovs") (all 5 columns are now named)
out <- lapply(myfiles, setNames, colnames)
This is how the list looks now
data
list[3] list of length 3
[[1]] list[2652 x3] (S3: data.frame) A data.frame with 2652 rows and 3 columns
[[2]] list[2646 x 3] (S3: data.frame) A data.frame with 2646 rows and 3 columns
[[3]] list[1460 x 3] (S3:data.frame) A data.frame with 1460 rows and 3 columns
data <- lapply(out, "[", 3:5)
(I want to retain only columns 3 to 5, and all the csv files now have only 3 columns. stitle, pidet and qcov in the list)
gb|AE006468.2|+|1707351-1707789|ARO:3002571|AAC(6')-Iaa [Salmonella enterica subsp. enterica serovar Typhimurium str. LT2] 96.522 46
gb|AY769962|+|2434-5611|ARO:3000781|adeJ [Acinetobacter baumannii] 87.273 22
gb|CP014358.1|-|2161325-2162750|ARO:3001327|mdtK [Salmonella enterica subsp. enterica serovar Typhimurium] 98.387 100
each csv file in that list looks like this.
mydata1 <- lapply[data, function (x) x[data$pident > 90]]
this is not working, where i want to filter them based on the percent ident anything above 90%.
After this I want to filter all the csv files in the list to qcoverage above 90. Then i want to remove duplicated rows in all the csv file based on the column stitle. One thing at a time. How to filter the csv files in the list based on qcoverage.