Question: Exporting sequences from Excel
0
gravatar for avocado_toast
8 months ago by
avocado_toast10 wrote:

I'm starting a bioinformatics project and I've been given an excel sheet with two columns: one containing a number assigned to the sequence and one containing the sequence. I have to build a bootstrap tree using all the sequences and there's over 7000 of them. Manually exporting all the sequences and assigning them to a file of the corresponding number would be incredibly tedious and time consuming, is there any other way of doing it that would be faster?

I'm planning on using Mega to build the tree as it's the fastest option that I'm aware of and I'll be working in either Windows or BioLinux depending on what options I can come up with for separating out the sequences.

Thanks in advance

export excel tree building • 285 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by avocado_toast10
1

Thank you both, that's awesome!

ADD REPLYlink written 8 months ago by avocado_toast10
1

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLYlink written 8 months ago by WouterDeCoster36k

My apologies, I'll do so in future!

ADD REPLYlink written 7 months ago by avocado_toast10
6
gravatar for jrj.healey
8 months ago by
jrj.healey10k
United Kingdom
jrj.healey10k wrote:

Export it to a csv, or tsv, transliterate the delimiter to a newline, and you'll have something approximately resembling a fasta file.

Then tell your collaborator/boss off for ever giving you sequence data in any form of MS Office format.

ADD COMMENTlink written 8 months ago by jrj.healey10k
0
gravatar for 5heikki
8 months ago by
5heikki8.1k
Finland
5heikki8.1k wrote:

Assuming 1st column contains the number and the 2nd column contains the sequence:

  1. Export the file as tab-separated values
  2. awk 'BEGIN{FS="\t";OFS="\n"}{print ">"$1,$2}' exportedFile.tsv > seqs.fna
ADD COMMENTlink modified 8 months ago • written 8 months ago by 5heikki8.1k
0
gravatar for cpad0112
8 months ago by
cpad011211k
India
cpad011211k wrote:
  1. make sure that first column has sequence ID/Identifier and second column has entire sequence. Now export the file as tsv.
  2. Download seqkit and execute seqkit tab2fx exported_seq.tsv -o exported_seq.fa
ADD COMMENTlink written 8 months ago by cpad011211k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1608 users visited in the last hour