Question: Trim fasta to a specific gene location
0
gravatar for James
3 months ago by
James20
APHA Weybridge, UK
James20 wrote:

Hi there,

I am using illumine sequencing to whole genome sequence avian avulaviruses. My output is a fasta file of the whole gneome,in 1 long sequence. As part of my analysis I want to be able to define a genotype but this is done from only 1 gene, or even just a portion of this gene, the fusion gene.

Does anyone have any suggestions of how to get out of my full genome just the section I want so I can use that in an alignment and tree to determine genotype.

I want to do this from the command line in a script, the rest of my script is a bash/shell script so it would be nice if this was also in that format.

The section of gene I want is not always in exactly the same place, there can be different lengths of genome, and different length genes within those genomes.

My first thought is that I could do an alignment of my full genome against a selection of the gene I am interested in, then maybe count the number of - characters that are added by the alignment before the gene in one sequence of the alignment, and count the - characters after the gene. Then use those numbers to delete that number of characters at the beginning and end of my full genome, if that makes any sense? Does anyone know how to do that, or a better way of doing this?

So for instance I have

>Avian-Avualavirus-full-genome
AGATCCCTGGGGGCGGAACCCTAGAGTATAAAGTAAATTTTGTCTCTTTGACTGTGGTGCCGAGAAGGGATGTCTACAGGATCCCAACTGCAGTATTAAAAGTATCTGGTTCAAGCCTATACAATCTTGCGCTCAATGTCACTATAGACGTGGATGTGGATCCGAAAAGCCCATTGGTCAAGTCCCTTTCTAAGTCTGATAGCGGATACTATGCGAATCTTTTTTTGCATATCGGGCTTATGTCCACTGTAGACAAGAAAGGAAAGAAAGTGACGTTTGACAAGATAGAGGAAAAGATAAGAAGACTCAATCTATCTGTCGGGCTCAGTGACGTGCTCGGACCCTCTGTGCTTGTAAAGGCGAGAGGTGCACGGACCAAGCTACTCGCCCCCTTCTTCTCTAGCAGTGGGACAGCCTGCTATCCTATAGCAAATGCCTCTCCTCAGGTTGCCAAGATACTCTGGAGCCAAACCGCGCACCTGCGGAGTGTGAAAGTCATCATTCAGGCCGGCACCCAGCGTGCTGTTGCAGTGACCGCTGATCACGAGGTAACCTCCACTAAAATAGAGAAGAAGCACGCCATTGCAAAATACAATCCTTTCAAGAAGTAAGTTGCTTCTTTAAAGCTGCGATTCACCTGTTTTCTTGAATCACCATGATACCAGATAACGATTCACCTCAACTGCTTATAGTTAGCTCACCTGTCTAGCAAATTAGAAAAAACACGGGTAGAAGAGTCTGGATCCCAACTAGTACATTCTAGGCGTAACATGGGCTCCAAACCTTCTATCAGGATCCCGGTACCTCTGATGCTGATCACTCAGATCATGCTGATATTAAGTTATATCTGTCTGACAAGCTCTCTTGACGGCAGGCCTCTTGCAGCTGCAGGGATTGTAGTAACAGGAGATAAGGCAGTCAATGTATACACCTCATCTCAGACAGGGTCAATCATAGTCAAGTTGCTCCCGAATATGCCTAAGGATAAAGAGGCGTGTGCAAAGGCCCCATTGGAGGCATACAACAGAACACTGACTACCTTGCTCACTCCCCTTGGAGATTCCATCCGTAAGATCCAAGGGTCAGTGGCCACGTCCGGAGGAAGGAGACAGAAACGCTTTATAGGTGCCGTTATTGGCAGTATAGCTCTCGGGGTTGCGACAGCGGCACAGATAACAGCAGCTGCGGCCTTAATACAAGCCAACCAAAATGCCGCCAACATCCTCCGGCTTAAAGAGAGCATTGCTGCAACCAATGAAGCTGTGCACGAAGTTACCGATGGATTATCACAACTATCAGTGGCAGTTGGAAAGATGCAACAATTCGTCAATGACCAGTTTAATAATACGGCGCGAGAATTGGACTGTATAAAGATTACACAACAAGTTGGTGTAGAACTCAACCTATACCTAACTGAGTTGACTACAGTATTCGGGCCACAGATCACTTCCCCTGCCTTAACTCAGCTGACTATACAGGCACTCTACAATCTAGCTGGCGGCAATATGGATTACTTGTTAACTAAGTTAGGTGTAGGGAACAATCAACTCAGCTCATTAATTGGTAGCGGCTTGATCACCGGCTACCCTATATTGTATGACTCACAGACTCAACTCTTAGGCATACAAGTAAATTTGCCCTCAGTCGGGAACCTAAATAATATGCGTGCCACCTACTTGGAGACTTTATCTGTAAGTACAACCAAGGGGTTTGCTTCAGCACTTGTCCCGAAGGTGGTGACACAAGTTGGTTCCGTGATAGAAGAACTTGACACCTCTTACTGTATAGAATCTGATCTGGATCTATATTGTACAAGAATAGTGACATTCCCCATGTCCCCAGGTATTTACTCTTGCTTGAGCGGTAACACATCAGCTTGCATGTATTCAAAGACTGAAGGCGCACTCACTACGCCATATATGGCCCTTAAGGGCTCAGTTATTGCCAATTGTAAGATAACAACATGTAGATGTGCAGACCCCCCTGGTATCATATCGCAAAATTATGGAGAAGCTGTATCTCTGATAGACAGACATTCGTGCAATGTCTTATCATTAGACGGGATAACTCTGAGGCTCAGTGGGGAATTTGATGCAACTTATCTCAAGAATATCTCAATACTAGATTCTCAAGTCATCGTGACAGGCAATCTCGATATATCAACTGAGCTTGGTAATGTCAACAATTCAATCAGCAATGCCTTGGACAAGTTGACAGAAAGCAACAGCAAGCTAGACAAAGTCAATGTCAGACTAACCAGTACATCTGCTCTCATCACCTATATCGCTCTAACTGTCATTTCTCTTTTCTTCGGTGTACTTAGTCTGGGTTTAGCATGTTACCTGATGTACAAGCAGAAGGCACAACAAAAGACCTTGCTATGGCTTGGGAATAATACTCTCGATCAGATGAGAGCCACCACAAGAGCGTGAATGTAGATAAGAGGTAGATGTCCACCCAGCTGCCGCCCGTGCGCTAACTCTTACGGCCTGTCAAGTAGAAGACTTAAGAAAAAACTGCTGGGTACAAGCGACCAAAGAACGATACACGGGTAGAACGGTCAGAGGATCCACCCTTCAGTCGGAAGCCAGGCTTCACAAAATCCGTTCTACCGCATCGCCAGCCACAGAGGCCAGCCATGAGCCGCGCGGTCAACAGAGTCATGCTAGAAAATGAGGAAAGGGAAGCAAAGAACACATGGCGCTTGATTTTCCGGATCGCGGTCCTACTTTTAATGATAATGATTCTAGCTATCTCCGCAGCTGCCTTGGCATACAGCATGGGGACCAGTACGCCGCGAGACCTCACAAGCATATCGATAGCGATCTCCAAGACAGAGGATAAGGTCACATCTTTACTCAGTTCAAGTCAAGATGTGATAGATAGGATATATAAGCAGGTGGCTCTCGAATCTCCGCTGGCGCTACTAAACACTGAGTCTATAATTATGAATGCTATAACCTCTCTCTCCTATCAAATCAACGGGGCCGCGAATAATAGCGGGTGTGGGGCGCCTGTTCATGACCCAGATTATATCGGGGGGATAGGCAAAGAACTCATAGTAGACGACACGAGTGATGTCACATCATTTTATCCTTCTGCCTATCAAGAACACTTGAATTTCATCCCAGCACCTACTACAGGATCCGGTTGCACTCGGATACCCTCATTCGACATGAGCACCACTCACTACTGTTACACTCACAATGTGATATTATCTGGTTGCAGAGATCACTCACATTCACATCAATACTTAGCACTTGGTGTGCTTCGGACATCTGCAACGGGGAAGGTATTCTTCTCTACTCTGCGTTCTATCAATTTAGATGACACCCAAAACCGGAAGTCCTGCAGTGTGAGTGCAACCCCTTTAGGCTGTGATATACTGTGCTCTAAGGTCACAGAGACTGAGGAGGAGGATTACAAGTCAGTTACCCCCACATCAATGGTGCACGGAAGGTTAGGGTTTGACGGTCAATACCATGAGAAGGACTTAGACACCACAGCCTTATTCAAGGATTGGGTGGCAAATTACCCAGGAGTGGGAGGTGGATCTTTTGTTGACGAGCGTGTATGGTTCCCAGTTTATGGAGGGCTCAAACCCAATTCACCCAGTGACACTGCGCAAGAAGGGAAATATGCAATATATAAGCGCTATAATGATACATGCCCCGATGAACAAGATTACCAAATTCGGATGGCTAAGTCTTCATATAAACCTGGGCGATTTGGTGGAAAGCGCGTACAGCAAGCCATCTTATCTATCAAAGTGTCAACGTCCTTAGGTGAGGACCCAATGCTGACTATTCCACCTAATACAATTACACTCATGGGGGCCGAAGGCAGAATTCTCACGGTAGGGACATCTCACTTCTTGTACCAACGAGGGTCTTCATATTTCTCCCCCGCTTTATTATACCCCATGACAATATTTAACAAAACAGCTACTCTTCATAGCCCTTATACATTTAATGCCTTCACTCGGCCAGGGAGTGTCCCTTGCCAGGCATCAGCAAGATGCCCCAACTCATGCATCACTGGAGTCTATACTGATCCATATCCCTTGATCTTTCATAGGAATCATACCCTACGAGGGGTTTTCGGGACGATGCTTGATGATGGGCAAGCAAGACTTAACCCTGTATCTGCAGTATTTGACAACATATCCCGCAGTCGTGTAACCCGGGTGAGTTCAAGCAGCACCAAGGCAGCATACACAACATCGACATGTTTTAAAGTTGTCAAGACCAATAAAACTTATTGTCTTAGTATCGCAGAAATATCCAATACCCTATTCGGGGAGTTTAGGATTGTTCCTTTACTAGTTGAGATCCTCAAGGATAATAGGGCTTAAGAAGCTAGGCTTGGCCGACCGAGTCAGCCACAAGACAGTCGGAAGGATGACACCGCACCAATCCTCTCCCACGATGCACAGAGACAGGCCGAGTATTAACATGAGCCAGGATCCCATGCTGCCAGGCAGCCACAATTCGACAACGCTGACATGATTAATTTGAGTCCCGTCTACAGTCACTTTATTAAGAAAAAATAACAAAAGCAGTGAGATACAAGAGAAAACAACCCTCAGAAGAAAGCACGGGTAGGACATGGCGGGCTCCGGTCCCGAAAGGGCAGAGCACCGGATTATCCTACCAGAGTCACATCTATCTTCCCCATTGGTCAAGCACAAATTGCTCTATTATTGGAAATTAACTGGGCTGCCGCTTCCTGACGAATGCGACTTTGATCATCTCATTATAAGCAGGCAATGGAAAAAAATACTGGAATCGGCCACTCCTGACACGGAAAGAATGATCAAACTCGGACGGGCAGTGCACCAGACCCTCAACCACAATTCCAAGATAACCGGAGTGCTCCATCCCAGGTGTTTAGAAGAACTGGCTAGTATTGAAATCCCTGACTCAACCAACAAATTTCGGAAGATTGAGAAGAAGATCCAAATTCATAATACAAGGTATGGAGAATTGTTCACAAAACTGTGCACGCATGTTGAAAAGAAATTGCTAGGATCATCCTGGTCTAACAATGTCCCACGATCAGAGGAATTCAGCAGCATCCGTACGGATCCGGCATTCTGGTTTCACTCAAAGTGGTCCAAAGCCAAGTTCGCATGGCTCCATATAAAACAGGTCCAAAGGCATCTGATTGTAGCAGCAAGAACAAGGTCTGCAGTCAACAAGTTAGTAACATTAACTCATAAGGTAGGCCACGTCTTTGTCACCCCTGAGCTTGTCATTGTGACACATACAGATGATAACAAGTTCACATGCCTCACCCAGGAACTTGTATTGATGTATGCAGATATGATGGAAGGCAGGGACATGGTCAACATAATATCTTCTACAGCGGCACATCTTAGGAACCTATCCGAGAAAATTGATGACATCCTGCGGTTAGTAGATGCCCTGGCAAGGGATTTGGGTAATCAAGTCTATGATGTTGTAGCATTAATGGAGGGATTCGCATACGGTGCCGTTCAGCTGCTTGAGCCTTCAGGTACATTTGCAGGAGATTTTTTTGCATTCAACCTACAGGAGCTCAAGGACACTCTAATCGAACTTCTCCCAAACAATATAGCGGAATCAGTAACTCACGCAATCGCCACTGTGTTCTCTGGCTTAGAACAGAATCAAGCAGCAGAGATGCTATGCTTGCTGCGTTTGTGGGGTCATCCACTGCTTGAGTCCCGTAGTGCAGCAAGAGCGGTCAGGAGCCAGATGTGCGCACCAAAGATGGTAGACTTCGATATGATCCTCCAGGTATTATCCTTCTTTAAAGGAACAATAATCAATGGATATAGAAAGAAGAACTCAGGTGTGTGGCCACGTGTCAAAGTAGATACAATATACGGGAATGTCATTGGGCAGCTGCATGCTGATTCAGCAGAGATCTCACATGAGGTCATGTTAAGGGAGTACAGGAGTTTATCTGCCCTTGAATTTGAGCCATGTATAGAGTATGACCCTGTTACCAATCTAAGCATGTTTCTAAAAGATAAGGCAATCGCACATCCGAATGATAACTGGCTTGCCTCGTTTAGGCGGAACCTTCTCTCTGAGGACCAGAAGAAACAGATAAAGGAGGCGACCTCAACTAACCGCCTCCTGATAGAGTTTTTAGAGTCAAATGATTTTGATCCATACAAAGAGATGGAATACCTGACAACCCTTGAGTATCTAAGAGATGATAATGTGGCAGTATCGTACTCACTCAAAGAGAAGGAGGTGAAAGTGAATGGGCGAATTTTTGCTAAGCTAACAAAGAAACTAAGGAACTGCCAGGTGATGGCAGAAGGAATTCTAGCTGACCAGATTGCACCTTTCTTCCAGGGGAATGGTGTCATCCAAGATAGCATATCCTTGACTAAGAGTATGTTAGCAATGAGTCAACTGTCCTTTAACAGCAATAAGAAACGTATCACCGACTGCAAGGAAAGGGTTTCCTCAAACCGCAATCATGATCCAAAAAACAAGAATCGTCGAAGGGTTGCCACTTTTATCACGACTGACTTGCAAAAGTATTGTCTTAACTGGAGATATCAGACAGTAAAATTATTCGCCCATGCCATCAATCAGCTGATGGGCCTGCCCCACTTTTTTGAGTGGATTCATCTTAGATTAATGGACACTACGATGTTTGTAGGGGATCCTTTCAATCCTCCGAGTGACCCGACTGATTGTGACTTATCAAGAGTCCCAAATGATGACATATATATTGTCAGTGCTAGAGGGGGCATTGAGGGACTCTGCCAGAAGCTATGGACAATGATCTCAATTGCTGCAATCCAACTTGCTGCGGCAAGAGCTCATTGTCGAGTTGCCTGCATGGTACAAGGTGACAATCAAGTAATAGCTGTAACGAGAGAGGTAAGATCTGACGACTCCCCGGAAATGGTGTTGACACAGTTACATCAAGCTAGTGATAATTTCTTCAAGGAATTGATCCACGTCAATCATCTGATCGGCCATAACCTGAAGGATCGTGAAACCATCAGATCAGACACATTCTTTATATACAGCAAGCGAATATTCAAAGATGGAGCAATACTCAGTCAGGTTCTCAAGAACTCATCTAAGTTGGTGCTAATATCAGGCGACCTTAGCGAAAACACTGTAATGTCCTGTGCCAATATTGCATCCACTGTAGCAAGACTTTGTGAGAACGGGCTTCCTAAGGATTTCTGCTACTATTTGAACTACCTAATGAGTTGCGTGCAGACATACTTTGATTCAGAATTTTCTATTACCCACAGCACTCAACCAGATTCCAACCAATCCTGGATCGAGGATATCTCTTTCGTACACTCATACGTGTTAACTCCTGCCCAGCTGGGGGGATTGAGCAACCTTCAATACTCAAGGCTCTACACAAGGAATATTGGTGATCCAGGGACTACTGCTTTCGCAGAGGTCAAGCGATTAGAAGCAGTAGGGTTGCTGAGTCCTAGCATTATGACTAACATCTTAACCAGACCACCTGGCAACGGAGACTGGGCCAGCCTGTGCAATGATCCGTACTCCTTCAATTTTGAGACTGTTGCAAGCCCCAACATTGTCCTCAAGAAACATACACAGAAAGTCTTATTCGAGACTTGCTCAAACCCCTTATTATCTGGGGTACACACAGAGGACAATGAGGCTGAAGAGAAAGCATTGGCTGAATTCTTACTCAACCAAGAAGTGATTCACCCACGTGTCGCACATGCTATCATGGAAGCAAGCTCTGTAGGTAGAAGAAAGCAAATTCAAGGGCTCGTTGACACAACGAACACTGTGATTAAGATTGCACTGACTAGGAGGCCCCTCGGTATTAAAAGGCTGATGCGGATAATCAATTACTCAAGCATGCATGCAATGTTATTCAGAGATGATATTTTCTTATCCAATAGATCCAACCACCCATTGGTTTCTTCCACTATGTGCTCGCTGACGCTTGCAGACTATGCCCGGAACAGAAGCTGGTCACCCCTGACAGGGGGCAGGAAAATACTGGGTGTATCCAACCCCGATACCATAGAACTTGTGGAGGGAGAGATTCTCAGTGTCAGTGGAGGGTGCACAAAGTGTGACAGTGGAGATGAGCAGTTTACTTGGTTCCATCTTCCAAGCAATATAGAGCTGACTGACGACACCAGCAAAAATCCCCCAATGAGAGTGCCGTATCTCGGGTCGAAGACTCAAGAGAGGAGAGCTGCCTCACTTGCGAAAATAGCTCATATGTCACCACATGTGAAAGCAGCGCTAAGGGCATCATCCGTGTTAATCTGGGCTTATGGGGACAACGAAGTAAACTGGACTGCTGCTCTTAATATCGCAAGATCTCGATGCAACATAAGCTCAGAGTATCTTCGGCTATTGTCACCCCTGCCCACAGCTGGGAATCTCCAACATAGATTGGATGATGGCATAACCCAGATGACATTTACCCCTGCATCTCTCTACAGAGTATCACCTTATGTTCACATATCCAATGATTCTCAAAGACTATTCACCGAAGAAGGGGTCAAAGAGGGGAATGTGGTTTATCAACAAATCATGCTCTTGGGTTTATCTCTAATTGAGTCGCTCTTCCCAATGACAACGACCAGAACGTATGATGAGATCACATTACACCTCCACAGCAAATTTAGCTGCTGTATCCGGGAAGCGCCTGTTGCGGTCCCCTTTGAACTCCTTGGGCTGGCACCAGAATTAAGGATGGTAACCTCAAATAAGTTCATGTATGATCCCAGTCCTATATCAGAGAAAGATTTTGCGAGACTTGACTTAGCTATCTTCAAGAGTTACGAGCTTAATCTGGAATCATATCCTACGCTGGAGCTAATGAACATCCTTTCAATATCTAGCGGGAAGTTGATTGGTCAGTCCGTGGTTTCTTACGATGAGGACACTTCTATAAAGAATGATGCTATAATAGTATATGACAACACACGAAATTGGATTAGTGAGGCTCAGAATTCAGACGTGGTCCGCCTATTCGAGTATGCGGCACTTGAAGTGCTCCTTGACTGTTCCTACCAACTCTACTATCTGAGGGTGAGGGGCCTAAACAACATCGTCTTGTACATGAATGACTTATATAAGAACATGCCAGGAATCCTACTCTCCAATATTGCGACTACGATATCCCACCCCATCATTCACTCAAGGTTGAATGCAGTCGGCTTAATTAACCATGACGGGTCACACCAGCTTGCAGATATAGATTTCATTGAGATGTCGGCAAAATTGTTAGTCTCTTGCACTCGACGTGTGGTCTCAGGCTTATATGCAGGGAATAAGTACGATCTGCTGTTTCCGTCTGTCTTAGATGATAACCTAAATGAGAAGATGCTTCAGCTGATTTCCCGATTGTGCTGTCTATACACAGTGCTCTTTGCTACAACAAGGGAAATCCCAAAAATAAGAGGCCTATCAGCAGAAGAAAAATGCTCAGTACTCACTGAGTACCTACTTTCAGATGCTGTGAAACCATTGCTTAGGTCCGAACAATTGAGCTCTGTCATGTCTCCTAACATAATTACGTTCCCAGCGAATCTATATTACATGTCTAGAAAGAGCCTTAATTTGATCAGGGAACGCGAGGACAGAGATACTATCTTGTCGTTGTTGTTCCCTCAGGAACCACTGCTTGAACTTCGTCCAGTACAAGACATTGGTGCTCGAGTGAAAGACCCGTTTACCAGGCAACCAGCATCATTCATACAAGAGCTAGATTTGAGCGCCCCAGCAAGGTATGACGCATTCACATTTAATAGGGCTTGCTTCGAGCACACATTACCGAACCCAAGGGAAGATCACCTAGTACGGTACTTGTTCAGAGGAATAGGAACTGCCTCATCTTCTTGGTACAAGGCGTCTCATCTTCTTTCCGTACCCGAGGTCAGATGTGCAAGACATGGGAACTCCTTATACTTGGCGGAAGGAAGCGGAGCTATCATGAGTCTTCTTGAATTGCATATACCGCATGAGACTATCTATTATAATACGCTTTTCTCGAATGAGATGAACCCTCCACAGCGACATTTCGGACCTACGCCAACACAATTTCTAAATTCAGTCGTTTATAGGAATCTACAAGCGGAAGTGCCATGTAAAGACGGATATGTCCAGGAGTTTTGCCCACTATGGAGAGAGAATGCAGAAGAAAGCGACCTGACATCAGATAAAGCAGTTGGTTATATCACATCTGTGGTACCCTACAGGTCTGTATCATTACTACATTGTGACATTGAAATTCCTCCGGGGTCCAATCAAAGCTTATTAGATCAACTAGCTACTAATATATCTCTGATTGCCATGCATTCTGTGAAGGAGGGCGGGGTAGTGATCATCAAAGTACTGTATGCAATGGGGTACTACTTTCATCTACTCATAAACTTATTCACTCCATGTTCCACAAAAGGATATATACTCTCCAACGGCTACGCCTGTAGAGGGGATATGGAGTGTTACCTGATCTTTGTGATGGGCCACTTAGGCGGGCCTACATTCGTGCATGAAGTGGTAAGGATGGCAAAAACTCTAATACAGCGACACGGTACACTCCTATCCAAATCAGATGAAATCACACTGACTAAGCTATTTACCTCACAGCAGCGTCGTGTAACAGATATCCTATCCAGCCCCTTACCGAGGCTAATGAAGTTCTTGAGGGAAAATATTGATGCTGCATTAATTGAAGCCGGGGGACAGCCCGTCCGTCCATTCTGTGCAGAAAGTTTAGTGAGCACGCTAACAGATATGACCCAGACGACTCAGATCATTGCCAGCCACATTGACACAGTCATTCGCTCCGTAATTTACATGGAAGCTGAGGGTGACCTTGCCGACACAGTATTTTTATTTACACCCTACAATCTCTCTACAGACGGTAAAAAGAGAACATCACTTAAACAGTGCACAAGACAGATCTTGGAAGTCACAATACTGGGCCTCAGAGCCAAAGATGTCAATAAGGTCGGCAATCTAATTAGCTTGGTACTCAAAGGTGCGGTTTCTCTAGAGGACCTTATCCCATTAAGGACATATCTGAAGCGCAGTACCTGCCCTAAGTACCTGAAGGCAGTCCTAGGTATCACAAAACTCAAAGAAATGTTCACAGATACCTCCTTACTCTACTTGACTCGTGCTCAACAAAAATTCTACATGAAAACCATAGGTAATGCTACCAAGGGGTATTACAGTAATAATGACTCTTAAAGGCAATCGCATGCCAATAAACTATCTCCTTAACTGATTATTCCCTCATTGACCTAATTATACCAGATTAGAAAAAAGTTGGACTCCGACTCCTTGGAACTCGTACTCGGATTCAGTTAGTTAACTTTAAACAGGAGTGCGCGTAGTTGTCCCTAGTTATAGTCCTGTCGTTCACCAAATCTCTGTTTGGT

but I want

>AvianavulaVirus-partial-fusion-gene
ATGGGCTCCAAACCTTCTATCAGGATCCCGGTACCTCTGATGCTGATCACTCAGATCATGCTGATATTAAGTTATATCTGTCTGACAAGCTCTCTTGACGGCAGGCCTCTTGCAGCTGCAGGGATTGTAGTAACAGGAGATAAGGCAGTCAATGTATACACCTCATCTCAGACAGGGTCAATCATAGTCAAGTTGCTCCCGAATATGCCTAAGGATAAAGAGGCGTGTGCAAAGGCCCCATTGGAGGCATACAACAGAACACTGACTACCTTGCTCACTCCCCTTGGAGATTCCATCCGTAAGATCCAAGGGTCAGTGGCCACGTCCGGAGGAAGGAGACAGAAACGCTTTATAGGTGCCGTTATTGGCAGTAT

Many thanks James

bash trim sequence alignment fasta • 140 views
ADD COMMENTlink written 3 months ago by James20

not clear: how do you know where is the gene in the fasta ? do you have the coordinates ? do you just have the sequence ?

ADD REPLYlink modified 3 months ago • written 3 months ago by Pierre Lindenbaum117k

I just have sequences from other isolates which are the short section of gene I want. I would normally do an alignment with my new full genome and several short target sequences of previous isolates, then trim manually using something like mega. I'm hoping to find a way to automate this.

ADD REPLYlink written 3 months ago by James20

Since its not necessarily always going to be in the exact same position, you will have to align the sequences, e.g. via blast, to get the coordinates of the best match.

ADD REPLYlink written 3 months ago by jrj.healey10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1379 users visited in the last hour