I'm following the pointers in this post, regarding the use of Rsamtools and the indexFa() function to index a large fasta file of around 7GB, about 2000 nt sequences. However only the first ~87 are indexed, I see no reason why it would stop at this point unless it reaches a limitation.
What's more, the file the function is operating on, indexFa("seqs.fa"), becomes truncated to these first 87 entries also (I don't know how the function works, but that was quite unexpected). Any solution to this or other methods of indexing a large fasta file to call in sequences by name to avoid trying to read in the whole fasta file with read.fasta() in one go (which gets Killed prematurely too).
library(Rsamtools) indexFa("seqs.fa") # I think this is stumbling point fa = FaFile("seqs.fa")
gr = as(seqinfo(fa), "GRanges")
idx = pmatch("296783888", names(gr)) seq = getSeq(fa, gr[idx])