Question: change seq name in a fasta file with a dataframe
0
gravatar for Darill
18 months ago by
Darill30
Darill30 wrote:

I got a problem, I explain the point.

I have one fasta file such:

>seqA
AAAAATTTGG
>seqB
ATTGGGCCG
>seqC
ATTGGCC
>seqD
ATTGGACAG

and a dataframe :

seq name      New name seq
seqB            BOBO
seqC            JOHN

and I simpy want to change my ID seq name in the fasta file if there is the same seq name in my dataframe and change it to the new name seq, it would give:

New fasta fil:

>seqA
AAAAATTTGG
>BOBO
ATTGGGCCG
>JOHN
ATTGGCC
>seqD
ATTGGACAG

Thank you very much

pandas python fasta • 1.2k views
ADD COMMENTlink modified 18 months ago by Chirag Parsania1.7k • written 18 months ago by Darill30

Outside R :

Export your data frame out and remove the headers (let eg file be test.txt). From the above example, following would be test.txt (tab separated)

seqB    BOBO
seqC    JOHN

Run following command on example fasta file above:

$ seqkit replace -p '(.+)' -r '{kv}' -K -k test.txt test.fa > test2.fa

output:

$ cat test2.fa 
>seqA
AAAAATTTGG
>BOBO
ATTGGGCCG
>JOHN
ATTGGCC
>seqD
ATTGGACAG

Download seqkit from here: http://bioinf.shenwei.me/seqkit/download/

ADD REPLYlink modified 18 months ago • written 18 months ago by cpad011212k

Thanks for your help but is there a solution with python?

ADD REPLYlink written 18 months ago by Darill30
3
gravatar for Chirag Parsania
18 months ago by
Chirag Parsania1.7k
University of Macau
Chirag Parsania1.7k wrote:

Can be done by R Biostrings library

library(Biostrings)

## load fasta file into R 
inFasta <- readAAStringSet("aminoAcid.fasta") ## for amino acid fasta
inFasta <- readDNAStringSet("dnaSeq.fasta")  ## for dna fasta

## get seq names from fasta 
fa_given_names <- names(inFasta)

## prepare data frame, 
df <- data.frame(seq_name = names(inFasta) , new_name = paste(names(inFasta),"_new",sep = ""))

## assign new seq names  by mapping fasta seq name to data frame names
names(inFasta) <- df[match(fa_given_names , df$seq_name) , "new_name"]

## write data to fasta file with updated names
writeXStringSet(inFasta , "fa_with_new_headers.fa")
ADD COMMENTlink modified 18 months ago • written 18 months ago by Chirag Parsania1.7k

Thanks your for your help but do you think it is possible on python3? Indeed I'm using it for my pipeline.

ADD REPLYlink written 18 months ago by Darill30
1

Toto26,

I see that you've mentioned the python tag in your post, and it is generally recommended that python3 be used as the default python. Beyond this, there is no way for anyone to connect your question to your requested solution framework. It is advisable to add these details to the body of your post when you create the post (especially in your case where you seem to know what you want to use) - this ensures others invest the precious time they have in the right direction.

Either that, or you can use their solution/algorithm adapted to python3, which should not be huge deal. It can also serve as a nice exercise, IMO.

ADD REPLYlink written 18 months ago by RamRS24k

If that is important then you should have mentioned that from the beginning.

ADD REPLYlink written 18 months ago by WouterDeCoster42k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1852 users visited in the last hour