Entering edit mode
5.0 years ago
crs68219
•
0
I am new to the field so I apologize if there is an easy solution. I have a bash/python pipeline set up that is aligning fasta files with a consensus sequence using clustal. These four files do not represent the entire sequence, they have gaps, and they overlap. My output file looks something like this but on a larger scale:
>Consensus
AAGGTCAAATCTCGTAGAAGCCCCCCGAGGCGAGGAGAAAAAAAACGAAGGTCCGTCGAG
TAAGACTCTCCTCCCTGAGGCTGGGATCCCGGCGGCCGGCGCCGCGACGCTGTTCGGCAG
CCATGGACTCCGGGACAGGAAGCTCCGCTGATCATATTCGTGACGCGTCTCTACCTGGTT
>Breed1A
- - - GTCAAATCTCGTAGAAGCCCCCCGAGGCGAGGAGAAAAAAAACGAAGGTCCGTCGAG
- - - - - - - - - - - - - - -
>Breed2A
- - - - - - - - - - - - - - - - - - - -CCCCGAGGCGAGGAGAAAAAAAACGAAGGTCCGTCGAG
TAAGACTCTCCTCCCTGAGGC - - - - - - - - -
>Breed3A
- - - - - - - - - - - - - - - - - - - - - - - -GAGGCGAGGAGAAAAAAAACGAAGGTCCGTCGAG
TAAGACTCTCCTCCCTGAGGCTGGGATCCCGGC - - - - - -
>Breed4A
- - - - - - - - - - - - - - - - - - - - - - - -GAGGCGAGGAGAAAAAAAACGAAGGTCCGTCGAG
TAAGACTCTCCTCCCTGAGGCTGGGATCC - - - -
I have several different breeds, and it is taking too long copying a pasting to get a single sequence for one breed. Is there any tools or suggestions for how to tackle this problem?
Thanks
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.Perhaps try the
consensus
tool from theEMBOSS
package.