Question: how to create a fasta file with subset of sequences using seqinr
0
gravatar for Camilla
4 months ago by
Camilla10
European Union
Camilla10 wrote:

Hi everyone, I am very new in environment R. I am trying to obtain a final fasta file selecting a subset of sequences from a downloaded fasta file, using seqinr. For this purpose I type the following script:

 myfasta<- read.fasta(file = "MRP2.fas", seqtype = "AA",as.string = TRUE, set.attributes = FALSE)
  subsetlist<-read.table("list.txt", header=TRUE)
  myfasta[names(myfasta) %in% subsetlist$id]
  write.fasta(sequences = myfasta, names = names(myfasta), nbchar = 80, file.out = "parsedMRP2.fasta")
  parsedMRP2<- read.fasta("parsedMRP2.fasta", set.attributes = FALSE)"

The MRP2.fas contains 5039 sequences, while the list.txt is a list of 3000 sequences. When I run my script, I obtain final file (parsedMRP2.fasta) always with 5039 sequences instead of the 3000 sequences that I indicated in the list.txt.

Could you please help me? thank you very much in advance

fasta sequences R seqin • 400 views
ADD COMMENTlink modified 13 days ago by Biostar ♦♦ 20 • written 4 months ago by Camilla10
1

If you need an alternative then using faSomeRecords from Jim Kent's utilities is a great/fast option. Linux version is linked but macOS available as well. After downloading do not forget to add execute permissions (chmod a+x faSomeRecords).

faSomeRecords - Extract multiple fa records
usage:
   faSomeRecords in.fa listFile out.fa
options:
   -exclude - output sequences not in the list file.
ADD REPLYlink modified 4 months ago • written 4 months ago by genomax67k
2
gravatar for Chirag Parsania
4 months ago by
Chirag Parsania1.4k
University of Macau
Chirag Parsania1.4k wrote:

Because you are not saving data in new object. Before you write in the file, you need to save in another object. See the change below.

myfasta<- read.fasta(file = "MRP2.fas", seqtype = "AA",as.string = TRUE, set.attributes = FALSE)
subsetlist<-read.table("list.txt", header=TRUE)

my_fasta_sub <- myfasta[names(myfasta) %in% subsetlist$id]
write.fasta(sequences = my_fasta_sub, names = names(my_fasta_sub), nbchar = 80, file.out = "parsedMRP2.fasta")
ADD COMMENTlink written 4 months ago by Chirag Parsania1.4k

Thank you very much Chirag for the solution ! Now it works!

ADD REPLYlink written 4 months ago by Camilla10

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLYlink written 4 months ago by genomax67k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1048 users visited in the last hour