How To Read A Fasta File With The Bio3D Package
10.9 years ago
laemtao ▴ 40

Problem: I'm using the bio3d package in R to read a fasta file with about 4 sequences. When I try to read the fasta file, I get the error message that the '>' character cannot be found. This would mean two things to me, either the fasta file is corrupted with invisible ascii characters or the permissions on the file are wrong. I checked both conditions and I am still not able to read my file.

> foo <-system.file("~/Homology/seq_temp.fa", package="bio3d")
1: attributes(aln)
2:
read.fasta: no '>' id lines found, check file format


The example xray.fa is working correctly:

> foo <-system.file("examples/hivp_xray.fa", package="bio3d")
> attributes(aln)
$names [1] "id" "ali"$class
[1] "fasta"


I'm not too sure what the problem can be. Writing a fasta file (unaligned or aligned) is pretty fool proof, I have no idea what is causing this error.

Please paste the first few lines of your fasta file here, e.g. use "head -5 ~Homology/seq_temp.fa". Somebody here might spot something obviously wrong with it.

Also, can you please rename the question to something like a question - the title looks awful and is giving me a headache :)

I am not receiving any output, I know I should be:

head -5 ~Homology/seq_temp.fa

Regardless, here is the first sequence from my fasta file:

>gi|86159715|ref|YP_466500.1| FAD-dependent pyridine nucleotide-disulphide oxidoreductase [Anaeromyxobacter dehalogenans 2CP-C]
MRVAIIGSGPAGFYAAEALLKRTDTAVDVDMFDRLPTPFGLVRGGVAPDHQRIKAVTRVFASTAARPTFR
FLGNVRLGRDVTVDDLRRHYHQIVYATGSESDRRLGIPGEGIERCTPATVFVGWYNGHPDYRHARFDLSV
RRAAVVGNGNVAVDVARILLRTRAELERTDIAAHALEALRESQVREVYLLGRRGPAQAAFSPAELRELGT
[abridged]

this is not the point, check my answer again!

Thanks, it works! I can start plotting now!

10.9 years ago

Use the Biostrings package functions read.DNAStringset, read.AAStringset or readFASTA instead.

Edit: Nope, you simply copy pasted from the example: try

foo <- file("~/Homology/seq_temp.fa")


system.file is only for files in the R-installation!

Thank you, I was not familiar with Biostrings. I've installed it and I am receiving the following output:

>moo = system.file("/Homology/seq_temp.fa", package="Biostrings")
Error in readFASTA(moo) : no FASTA sequences found
use 'strip.descs=FALSE' for compatibility with old version
of readFASTA(), or 'strip.descs=TRUE' to remove the "&gt;"
at the beginning of the description lines and to get
rid of this warning (see '?readFASTA' for more details)
2: In file(file, "r") :
file("") only supports open = "w+" and open = "w+b": using the former

read ?system.file

As Michael mentioned stop using the system.file() command, use file() instead.

That's great. Laemtao, if you are happy with the answer, please accept it by clicking the check-mark on the side. Thanks.

10.8 years ago

The R system.file() command is the problem here. That is used in the bio3d documentation to indicate that the file ("examples/hivp_xray.fa") is somewhere in the bio3d package directory, wherever that may be in your system.

When reading your own files, all you need is the actual path:

>foo <-"~/Homology/seq_temp.fa"