Problems Importing Sequences Using Seqinr In R
2
2
Entering edit mode
13.9 years ago
Emma ▴ 20

Hi,

I've been using seqinr sucessfully for a while now, but ran into an odd problem today. I'm using read.fasta to read in fasta-format sequences, and for over 40,000 files it has done this repeatedly without problems.

However for 6 files, which are not unusual or different in any way that I can see from the other files, it is not reading them in correctly. Even though the sequences are over 100 bp (or at least over 50) it is returning to me 4-20 basepairs when it reads them in. Perhaps the most puzzling bit is that often what it is giving to me is not even in the original sequences - I have no idea where it is getting these bases from!

Is this is a common seqinr problem? Any steps I can take? It still can read in a random selection of the other files perfectly normally.

I'm running Windows 7 and R 2.11.0

Edit: basically it didn't work even when I simply did

read.fasta("CG13569_CG13569_FBtr0072283.fas")

However... now I'm trying to replicate it and it's reading correctly. Perhaps it was a memory error somehow?

fasta r sequence • 4.4k views
ADD COMMENT
1
Entering edit mode

Can you edit your post to add an example of the code you're using? Without that info, it's hard to figure out exactly what's going on.

ADD REPLY
0
Entering edit mode

i agree with Chris, and you could also paste a couple of the sequences (and headers) that are giving you problems.

ADD REPLY
0
Entering edit mode

Hi Emma. Maybe a random shot, but has the problematic files been prepared differently? Maybe under a Mac or a different system than what you use? Different systems (Linux, Mac or Windows) use different 'end of line' characters and many problems I solve for the people around me are caused by saving the file in a more or less compatible way under Mac. If ever this can help ;)

ADD REPLY
0
Entering edit mode

Agreed, need to see a sample sequence. We could download e.g. FBtr0072283 from FlyBase, but that may not reflect what you have on your hard drive.

ADD REPLY
2
Entering edit mode
13.9 years ago
Neilfws 49k

I have not used seqinr extensively, but I've never noticed an issue when reading fasta files, or of any other kind. I downloaded your example transcript from FlyBase and had no problem with read.fasta() on my system: 64-bit machine, Ubuntu/Linux 10.04, R 2.11.0, seqinr version 2.0-9.

I'm rather tempted to blame Windows - it has a bad habit of doing things to ASCII text files behind your back - particularly since you mention that the characters in the sequence do not match the original. It sounds like there may be hidden escape characters in the sequence. Make sure that your fasta files really are plain ASCII text and conform exactly to the fasta standard.

ADD COMMENT
2
Entering edit mode
13.9 years ago
Satish Gupta ▴ 40

According to my experience, sometimes there are hidden characters inside your sentences or sequences in case of you which a tool does not recognize. I have experienced it some time reading protein molecules from PDB. As u had said, after replication, its reading. You can check your sequences in VI EDITOR, I m sure you will find some hidden characters or improper indentations compared to ur normal fasta sequences.

Enjoy Satish

ADD COMMENT

Login before adding your answer.

Traffic: 2573 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6