How To Read A Fasta File With The Bio3D Package
2
1
Entering edit mode
10.9 years ago
laemtao ▴ 40

Problem: I'm using the bio3d package in R to read a fasta file with about 4 sequences. When I try to read the fasta file, I get the error message that the '>' character cannot be found. This would mean two things to me, either the fasta file is corrupted with invisible ascii characters or the permissions on the file are wrong. I checked both conditions and I am still not able to read my file.

> foo <-system.file("~/Homology/seq_temp.fa", package="bio3d")
1: attributes(aln)
2:
read.fasta: no '>' id lines found, check file format


The example xray.fa is working correctly:

> foo <-system.file("examples/hivp_xray.fa", package="bio3d")
> attributes(aln)
$names [1] "id" "ali"$class
[1] "fasta"


I'm not too sure what the problem can be. Writing a fasta file (unaligned or aligned) is pretty fool proof, I have no idea what is causing this error.

r bioconductor fasta • 5.6k views
0
Entering edit mode

Please paste the first few lines of your fasta file here, e.g. use "head -5 ~Homology/seq_temp.fa". Somebody here might spot something obviously wrong with it.

0
Entering edit mode

Also, can you please rename the question to something like a question - the title looks awful and is giving me a headache :)

0
Entering edit mode

I am not receiving any output, I know I should be:

head -5 ~Homology/seq_temp.fa

0
Entering edit mode

Regardless, here is the first sequence from my fasta file:

>gi|86159715|ref|YP_466500.1| FAD-dependent pyridine nucleotide-disulphide oxidoreductase [Anaeromyxobacter dehalogenans 2CP-C]
MRVAIIGSGPAGFYAAEALLKRTDTAVDVDMFDRLPTPFGLVRGGVAPDHQRIKAVTRVFASTAARPTFR
FLGNVRLGRDVTVDDLRRHYHQIVYATGSESDRRLGIPGEGIERCTPATVFVGWYNGHPDYRHARFDLSV
RRAAVVGNGNVAVDVARILLRTRAELERTDIAAHALEALRESQVREVYLLGRRGPAQAAFSPAELRELGT
[abridged]

0
Entering edit mode

0
Entering edit mode

this is not the point, check my answer again!

0
Entering edit mode

Thanks, it works! I can start plotting now!

3
Entering edit mode
10.9 years ago

Use the Biostrings package functions read.DNAStringset, read.AAStringset or readFASTA instead.

Edit: Nope, you simply copy pasted from the example: try

foo <- file("~/Homology/seq_temp.fa")


system.file is only for files in the R-installation!

0
Entering edit mode

Thank you, I was not familiar with Biostrings. I've installed it and I am receiving the following output:

0
Entering edit mode
>moo = system.file("/Homology/seq_temp.fa", package="Biostrings")
Error in readFASTA(moo) : no FASTA sequences found
use 'strip.descs=FALSE' for compatibility with old version
of readFASTA(), or 'strip.descs=TRUE' to remove the "&gt;"
at the beginning of the description lines and to get
rid of this warning (see '?readFASTA' for more details)
2: In file(file, "r") :
file("") only supports open = "w+" and open = "w+b": using the former

0
Entering edit mode

0
Entering edit mode

read ?system.file

0
Entering edit mode

As Michael mentioned stop using the system.file() command, use file() instead.

0
Entering edit mode

Thanks, it works! I can start plotting now!

0
Entering edit mode

That's great. Laemtao, if you are happy with the answer, please accept it by clicking the check-mark on the side. Thanks.

0
Entering edit mode
10.8 years ago

The R system.file() command is the problem here. That is used in the bio3d documentation to indicate that the file ("examples/hivp_xray.fa") is somewhere in the bio3d package directory, wherever that may be in your system.

When reading your own files, all you need is the actual path:

>foo <-"~/Homology/seq_temp.fa"