Replace list of sequence names with sequences from a fasta file using R
0
0
Entering edit mode
4.4 years ago

I have a list of sequence names (A1, A2, A3, A1, A1, A2 etc) and a fasta file with the names and sequences, and I am trying to find a way to replace each item on the list with the corresponding sequence from the fasta file.

I've used:

test <- sequences[names(sequences) %in% list]

which just extracts A1, A2, A3 and doesn't give me the remaining ones. What am I missing?

list of sequence names:

A1
A2
A3
A1
A1
A2

fasta file:

>A1
ATCATC
>A2
CCCGGG
>A3
GTGTGT
>A4
TCTATC
>A5
ATCTAC

output:

>A1
ATCATC
>A2
CCCGGG
>A3
GTGTGT

Desired output:

ATCATC
CCCGGG
GTGTGT
ATCATC
ATCATC
CCCGGG
R sequence • 1.1k views
ADD COMMENT
0
Entering edit mode

Please give representative in/output.

ADD REPLY
0
Entering edit mode

list of sequence names:

A1
A2
A3
A1
A1
A2

fasta file:

>A1
ATCATC
>A2
CCCGGG
>A3
GTGTGT
>A4
TCTATC
>A5
ATCTAC

output:

>A1
ATCATC
>A2
CCCGGG
>A3
GTGTGT

Desired output:

ATCATC
CCCGGG
GTGTGT
ATCATC
ATCATC
CCCGGG
ADD REPLY
0
Entering edit mode
  1. Please format your post better. I've done it for you this time.
  2. This shows what you have and what you need, but not what you've tried. Your single line of R code does not show how you read or write files, so we don't know the packages or functions you're using.
ADD REPLY
0
Entering edit mode

Is there an instruction segment for how to properly format a post? I was proud enough of myself for thinking to put in "< br >" when pressing the enter button didn't work. I'm a biologist not a programmer, so I don't know these things.

ADD REPLY
0
Entering edit mode

Apologies, we do not have a manual for the formatting bar yet. You did a great job with the <br> tags, but the formatting bar is your toolbelt for most tasks.

ADD REPLY
0
Entering edit mode

Try dedicated fasta/fastq manipulation tools such as: seqtk, seqkit etc. @ shelley.w.peterson. R code as follows:

> library(Biostrings)
> test=readDNAStringSet("test.fa", format = "fasta")
> names=read.csv("file.txt", header = F, stringsAsFactors = F, strip.white = T)
 > names
  V1
1 A1
2 A2
3 A3
4 A1
5 A1
6 A2
> data.frame("sequences"=test[names$V1])
  sequences
1    ATCATC
2    CCCGGG
3    GTGTGT
4    ATCATC
5    ATCATC
6    CCCGGG
ADD REPLY
0
Entering edit mode

Thanks so much!!!! As someone who is new to coding, sometimes it's hard to figure out if I'm using the wrong tool/command or if I'm using the correct one the wrong way -_-'

ADD REPLY
1
Entering edit mode

I started that way and learnt on the way. Keep visiting biostars @ shelley.w.peterson

ADD REPLY

Login before adding your answer.

Traffic: 1951 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6