Closed:Using R studio to process multiple blastn outputs(csv files) and filter based on qcoverage and precentident.
0
1
Entering edit mode
3.6 years ago
pramach1 ▴ 40
library(purrr)
library(tidyverse)

fnames <- list.files() (all csv files are loaded onto a list here). 
myfiles = lapply(fnames, read.delim)
strings <- str_split_fixed(myfiles$col1, " ", 5) (all the files are separating the column1 here based on tab space)
colnames <- c("qseqid", "sseqid", "stitle", "pident", "qcovs") (all 5 columns are now named)
out <- lapply(myfiles, setNames, colnames)

This is how the list looks now

data

list[3]                                              list of length 3
[[1]]       list[2652 x3] (S3: data.frame)  A data.frame with 2652 rows and 3 columns
[[2]]       list[2646 x 3] (S3: data.frame)    A data.frame with 2646 rows and 3 columns
[[3]]        list[1460 x 3] (S3:data.frame) A data.frame with 1460 rows and 3 columns

data <- lapply(out, "[", 3:5)

(I want to retain only columns 3 to 5, and all the csv files now have only 3 columns. stitle, pidet and qcov in the list)

gb|AE006468.2|+|1707351-1707789|ARO:3002571|AAC(6')-Iaa [Salmonella enterica subsp. enterica serovar Typhimurium str. LT2] 96.522  46
gb|AY769962|+|2434-5611|ARO:3000781|adeJ [Acinetobacter baumannii] 87.273 22
gb|CP014358.1|-|2161325-2162750|ARO:3001327|mdtK [Salmonella enterica subsp. enterica serovar Typhimurium] 98.387 100 

each csv file in that list looks like this.

mydata1 <- lapply[data, function (x) x[data$pident > 90]]

this is not working, where i want to filter them based on the percent ident anything above 90%.

After this I want to filter all the csv files in the list to qcoverage above 90. Then i want to remove duplicated rows in all the csv file based on the column stitle. One thing at a time. How to filter the csv files in the list based on qcoverage.

R blast • 152 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 1897 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6