Converting .faa file into a data frame
1
0
Entering edit mode
16 months ago
Samet • 0

I'm trying to scan an element in a .csv file, for example "NP_000005.3", within the .faa file, and combine the matches into the table with the relevant information.

.faa file does not have column names or titles. Therefore, I have tried converting it into a .cvs and data frame, yet, I was not successful.

The data from ncbi

library(seqinr)
data2_asfaa <- read.fasta("GCF_000001405.40_GRCh38.p14_protein.faa")
> summary(data2_asfaa)
               Length Class       Mode     
NP_000005.3     1474  SeqFastadna character
NP_000006.2      290  SeqFastadna character
NP_000007.1      421  SeqFastadna character
NP_000008.1      412  SeqFastadna character
NP_000009.1      655  SeqFastadna character
NP_000010.1      427  SeqFastadna character
NP_000011.2      503  SeqFastadna character
NP_000012.1      467  SeqFastadna character
NP_000013.2      363  SeqFastadna character
NP_000014.1      387  SeqFastadna character

> head(data2_asfaa)
$NP_000005.3
   [1] "m" "g" "k" "n" "k" "l" "l" "h" "p" "s" "l" "v" "l" "l" "l" "l" "v" "l" "l" "p" "t" "d" "a" "s" "v" "s" "g" "k"
  [29] "p" "q" "y" "m" "v" "l" "v" "p" "s" "l" "l" "h" "t" "e" "t" "t" "e" "k" "g" "c" "v" "l" "l" "s" "y" "l" "n" "e"
  [57] "t" "v" "t" "v" "s" "a" "s" "l" "e" "s" "v" "r" "g" "n" "r" "s" "l" "f" "t" "d" "l" "e" "a" "e" "n" "d" "v" "l"
  [85] ### shortened it while entering the post
 [ reached getOption("max.print") -- omitted 474 entries ]
attr(,"name")
[1] "NP_000005.3"
attr(,"Annot")
[1] ">NP_000005.3 alpha-2-macroglobulin isoform a precursor [Homo sapiens]"
attr(,"class")
[1] "SeqFastadna"
fasta r • 646 views
ADD COMMENT
1
Entering edit mode

This data format not intuitve when you are new to it.

You can access the following way

assuming you have the name: attr(data2_asfaa[['NP_000005.3']],'Annot')

or the ID attr(data2_asfaa[[1]],'Annot')

collapse the sequence: paste0(data2_asfaa[[1]],collapse = '')

ADD REPLY
3
Entering edit mode
16 months ago
ATpoint 81k

Biostrings:

library(Biostrings)

fa <- Biostrings::readAAStringSet("~/Downloads/GCF_000001405.40_GRCh38.p14_protein.faa.gz")

head(data.frame(names=names(fa), sequence=as.character(fa), width=width(fa)))
ADD COMMENT

Login before adding your answer.

Traffic: 1713 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6