Question: change seq name in a fasta file with a dataframe
0
gravatar for Chvatil
2.4 years ago by
Chvatil50
Chvatil50 wrote:

I got a problem, I explain the point.

I have one fasta file such:

>seqA
AAAAATTTGG
>seqB
ATTGGGCCG
>seqC
ATTGGCC
>seqD
ATTGGACAG

and a dataframe :

seq name      New name seq
seqB            BOBO
seqC            JOHN

and I simpy want to change my ID seq name in the fasta file if there is the same seq name in my dataframe and change it to the new name seq, it would give:

New fasta fil:

>seqA
AAAAATTTGG
>BOBO
ATTGGGCCG
>JOHN
ATTGGCC
>seqD
ATTGGACAG

Thank you very much

pandas python fasta • 2.0k views
ADD COMMENTlink modified 2.4 years ago by Chirag Parsania1.9k • written 2.4 years ago by Chvatil50

Outside R :

Export your data frame out and remove the headers (let eg file be test.txt). From the above example, following would be test.txt (tab separated)

seqB    BOBO
seqC    JOHN

Run following command on example fasta file above:

$ seqkit replace -p '(.+)' -r '{kv}' -K -k test.txt test.fa > test2.fa

output:

$ cat test2.fa 
>seqA
AAAAATTTGG
>BOBO
ATTGGGCCG
>JOHN
ATTGGCC
>seqD
ATTGGACAG

Download seqkit from here: http://bioinf.shenwei.me/seqkit/download/

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by cpad011214k

Thanks for your help but is there a solution with python?

ADD REPLYlink written 2.4 years ago by Chvatil50
3
gravatar for Chirag Parsania
2.4 years ago by
Chirag Parsania1.9k
University of Macau
Chirag Parsania1.9k wrote:

Can be done by R Biostrings library

library(Biostrings)

## load fasta file into R 
inFasta <- readAAStringSet("aminoAcid.fasta") ## for amino acid fasta
inFasta <- readDNAStringSet("dnaSeq.fasta")  ## for dna fasta

## get seq names from fasta 
fa_given_names <- names(inFasta)

## prepare data frame, 
df <- data.frame(seq_name = names(inFasta) , new_name = paste(names(inFasta),"_new",sep = ""))

## assign new seq names  by mapping fasta seq name to data frame names
names(inFasta) <- df[match(fa_given_names , df$seq_name) , "new_name"]

## write data to fasta file with updated names
writeXStringSet(inFasta , "fa_with_new_headers.fa")
ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Chirag Parsania1.9k

Thanks your for your help but do you think it is possible on python3? Indeed I'm using it for my pipeline.

ADD REPLYlink written 2.4 years ago by Chvatil50
1

Toto26,

I see that you've mentioned the python tag in your post, and it is generally recommended that python3 be used as the default python. Beyond this, there is no way for anyone to connect your question to your requested solution framework. It is advisable to add these details to the body of your post when you create the post (especially in your case where you seem to know what you want to use) - this ensures others invest the precious time they have in the right direction.

Either that, or you can use their solution/algorithm adapted to python3, which should not be huge deal. It can also serve as a nice exercise, IMO.

ADD REPLYlink written 2.4 years ago by RamRS30k

If that is important then you should have mentioned that from the beginning.

ADD REPLYlink written 2.4 years ago by WouterDeCoster44k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1196 users visited in the last hour