Question

How to extract specific samples (by ID) from Fasta file to new fasta file in R

0

Entering edit mode

2.4 years ago

Katya • 0

I have a question concerning the extraction of sequences from a multy fasta file with sequence headers. I have been playing around and been looking all over the internet to find a solution for this problem, but surprisingly, nothing really matches what I want to do.

code R • 757 views

ADD COMMENT • link updated 2.4 years ago by ATpoint 81k • written 2.4 years ago by Katya • 0

0

Entering edit mode

Also for non-R solutions: How To Extract A Sequence From A Big (6Gb) Multifasta File ?

ADD REPLY • link 2.4 years ago by ATpoint 81k

score 2 · Accepted Answer · 2021-11-30

#/ In R using Biostrings:
library(Biostrings)

fa <- readDNAStringSet("~/foo.fa")
> fa
DNAStringSet object of length 3:
    width seq                names               
[1]     4 ATCG               chr1
[2]    12 GGATGTGTGTCA       chr2
[3]     6 GTAGCT             chr3

#/ Say we want chr2 and chr3:
fa_new <- fa[c("chr2", "chr3")]
> fa_new
DNAStringSet object of length 2:
    width seq                names               
[1]    12 GGATGTGTGTCA       chr2
[2]     6 GTAGCT             chr3

#/ write back to a file:
writeXStringSet(fa_new, "~/out.fa")