Entering edit mode
16 months ago
Samet
•
0
I'm trying to scan an element in a .csv file, for example "NP_000005.3", within the .faa file, and combine the matches into the table with the relevant information.
.faa file does not have column names or titles. Therefore, I have tried converting it into a .cvs and data frame, yet, I was not successful.
The data from ncbi
library(seqinr)
data2_asfaa <- read.fasta("GCF_000001405.40_GRCh38.p14_protein.faa")
> summary(data2_asfaa)
Length Class Mode
NP_000005.3 1474 SeqFastadna character
NP_000006.2 290 SeqFastadna character
NP_000007.1 421 SeqFastadna character
NP_000008.1 412 SeqFastadna character
NP_000009.1 655 SeqFastadna character
NP_000010.1 427 SeqFastadna character
NP_000011.2 503 SeqFastadna character
NP_000012.1 467 SeqFastadna character
NP_000013.2 363 SeqFastadna character
NP_000014.1 387 SeqFastadna character
> head(data2_asfaa)
$NP_000005.3
[1] "m" "g" "k" "n" "k" "l" "l" "h" "p" "s" "l" "v" "l" "l" "l" "l" "v" "l" "l" "p" "t" "d" "a" "s" "v" "s" "g" "k"
[29] "p" "q" "y" "m" "v" "l" "v" "p" "s" "l" "l" "h" "t" "e" "t" "t" "e" "k" "g" "c" "v" "l" "l" "s" "y" "l" "n" "e"
[57] "t" "v" "t" "v" "s" "a" "s" "l" "e" "s" "v" "r" "g" "n" "r" "s" "l" "f" "t" "d" "l" "e" "a" "e" "n" "d" "v" "l"
[85] ### shortened it while entering the post
[ reached getOption("max.print") -- omitted 474 entries ]
attr(,"name")
[1] "NP_000005.3"
attr(,"Annot")
[1] ">NP_000005.3 alpha-2-macroglobulin isoform a precursor [Homo sapiens]"
attr(,"class")
[1] "SeqFastadna"
This data format not intuitve when you are new to it.
You can access the following way
assuming you have the name:
attr(data2_asfaa[['NP_000005.3']],'Annot')
or the ID
attr(data2_asfaa[[1]],'Annot')
collapse the sequence:
paste0(data2_asfaa[[1]],collapse = '')