Dear BIostar community
I have around 50,000 sequence identifier in a text file and around 1Million sequences in a large fasta file.
I want to get the sequence of these 50,000 from the fasta file.
I checked the questions and answers already in the community pages. But most of them are using perl,linux,python.
How can it be done in R(fasomerecord/grep/something else)?
Any suggestion?
Thanks
You want this done with
grep, but not in linux?I just want to know it this is possible in R?
Yes, it is (please see this post: How To Extract Multiple Fasta Sequences At A Time From A File Containing Sequences Ids Using R)
samtools faidx is pretty handy.