Question

how can I retrieve FASTA sequence using ID in another text file?

0

Entering edit mode

10.3 years ago

tcf.hcdg ▴ 70

Dear BIostar community

I have around 50,000 sequence identifier in a text file and around 1Million sequences in a large fasta file.

I want to get the sequence of these 50,000 from the fasta file.

I checked the questions and answers already in the community pages. But most of them are using perl,linux,python.

How can it be done in R(fasomerecord/grep/something else)?

Any suggestion?

Thanks

fasta • 2.8k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 10.3 years ago by tcf.hcdg ▴ 70

1

Entering edit mode

You want this done with grep, but not in linux?

ADD REPLY • link 10.3 years ago by PoGibas 5.1k

0

Entering edit mode

I just want to know it this is possible in R?

ADD REPLY • link updated 2.9 years ago by Ram 45k • written 10.3 years ago by tcf.hcdg ▴ 70

2

Entering edit mode

Yes, it is (please see this post: How To Extract Multiple Fasta Sequences At A Time From A File Containing Sequences Ids Using R)

ADD REPLY • link 10.3 years ago by PoGibas 5.1k

1

Entering edit mode

samtools faidx is pretty handy.

ADD REPLY • link 10.3 years ago by Ashutosh Pandey 12k

Ram · Answer 1 · 2015-07-08

6

Entering edit mode

10.3 years ago

Devon Ryan 105k

If you really want to do this in R, you could use the seqinr package. You end up loading the fasta file into memory, subsetting the results according to the names() accessor, and then writing the results of that to a file. Presumably you could alternatively pass only the names you want to write.fasta().

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 10.3 years ago by Devon Ryan 105k

score 2 · Answer 2 · 2015-07-08

2

Entering edit mode

10.3 years ago

royefendymarbun ▴ 30

From google , may be is : a-little-book-of-r-for-bioinformatics.readthedocs.org/en/latest/src/chapter1.html

ADD COMMENT • link 10.3 years ago by royefendymarbun ▴ 30