Question

How To Read And Write Several Sequences In The Format Fasta Placed In One File In Perl?

0

Entering edit mode

13.0 years ago

Vouchsafing • 0

Hey. I'm totally new when it comes to programming in Perl. But I have to make a project and I have no idea how to do it. Maybe someone can help me. The script have to load several sequences in the format FASTA placed in one file. Example input file attached (for example 'gens.txt'):

>A
AGTATCGGACCCGAAGACATTACGCTTAGAGACTTGAAAA
CCTACAGTAAAGAAGCAGCGTCTGGATATCTGGAAGACAA
CGGATTGAAGCTTGTAGAAAAAGAAGCATACTCAGATGAT
GTTCCAGAAGGACAGGTTGTCAAACAAAAACCAGCAGCAG
GTACGGCAGTAAAGCCGGGAAACGAAGTTGAAGTGACATT
CTCTCTCGGACCAGAGAAAAAACCTGCGAAAACAGTGAAA
GAAAAGGTCAAGATCCCCTACGAACCAGAAAATGAAGGGG
ACGAGCTTCAAGTGCAAATCGCGGTTGACGATGCGGATCA
>B
CCATATCGGAGACAGCAGATGCTATTTGCTTCAGGACGAT
GATTTCGTTCAAGTGACAGAAGACCATTCGCTTGTAAATG
AACTGGTTCGCACTGGAGAGATTTCCAGAGAAGACGCTGA
ACATCATCCGCGAAAAAATGTGTTGACGAAGGCGCTTGGA
ACAGACCAGTTAGTCAGTATTGACACCCGTTCCTTTGATA
TAGAACCCGGAGACAAACTGCTTCTATGTTCTGACGGACT
GACAAATAAAGTGGAAGGCACTGAGTTAAAAGACATCCTG
TGGACAAAGCCAATCAGAATGGCGGAGAAGGCGGAGAAGC
>C
ATAAAACAACGGTATTTGCCGGTCAGTCCGGTGTTGGGAA
ATCCTCGCTTCTCAACGCGATCAGTCCGGAGCTCGGATTA
AGAACAAACGAGATTTCCGAGCATTTGGGCCGCGGGAAAC
ACACAACCCGCCACGTGGAGCTGATTCACACGTCCGGAGG
TTTGGTTGCAGATACACCGGGATTCAGCTCGCTTGAATTT
ACAGACATTGAGGAAGAAGAGCTGGGCTATACCTTCCCTG
ATATCAGAGAAAAAAGCTCTTCATGCAAATTTAGAGGCTG
TTTACATCTGAAAGAGCCGAAATGTGCGGTGAAACAAGCT

Then the script should check how similar are the sequences and print percent identity, and then it should also generate a consensus sequence.

perl fasta alignment consensus • 3.0k views

ADD COMMENT • link updated 13.0 years ago by Keith Callenberg ▴ 960 • written 13.0 years ago by Vouchsafing • 0

2

Entering edit mode

If you really have no idea where to begin, then providing a ready-made answer for you will be no help whatsoever. First, learn some Perl basics. Second, learn some Bioperl. Third, learn to identify the correct tool for the job. There is plenty of alignment software available to do this task: it's not really a job for Perl.

ADD REPLY • link 13.0 years ago by Neilfws 49k

1

Entering edit mode

wouldn't it be great if the perl script also provides the A+ for the assignment? ok, no kidding, if you are really asking for help I would suggest you to read something about hashes (to store your sequences into them) and the functions "open" (to read the file), "for" (to loop through data lines) and "join" (to get each sequences' lines into a single one). once you get there you will surely find some help here.

ADD REPLY • link 13.0 years ago by Jorge Amigo 14k

Ram · Answer 1 · 2011-04-25

Unless your assignment is really just about basic I/O and not about bioinformatics, and you are not allowed to use outside libraries for the mundane stuff, bioperl is a great way to do things like this. There is no need to reinvent the wheel.

Once you've downloaded bioperl, this section shows how to read/write FASTA sequences from/to a file: http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_file