A sequence is break to form new sequence
0
0
Entering edit mode
8.0 years ago
waqasnayab ▴ 250

Hi Community,

I have two fasta files: 1. right.fasta: head

>COSN17913384
CAAGAGGGGTGAATGTGTTTTGCATGCACAAGGGACAGGAG
>COSN6473262
TCAGAGCTGGTGGGGTGGAGGGACAGAAACAAGTGGGAGAA
>COSN17979053
TATACCTACCTTATAGATAAAGAAATTGAAGCTTATAGAGT
>COSN15187127
TTTTTCCTTATGATACTCTACTGCCTCTCCATGGATAAAGA
>COSN17118087
AACTCCTGACCTCAGGTGATACACCTGCCTCGGCCTCCCAA

tail:

>COSN18618110
CTATGCCAGGACAGTGTAGCAGCCCCGTGGTGCTGACAAAT
>COSN18201496|COSN18701930|COSN18583086
CAGCAGCTATAGGTCTGGGGCGGGGCCGCTTGGCAAGAACA
>COSN18653667
AAGTCATTGCTGTCCTGTCCCGCCTGGGGCTTTTGTGGACC
>COSN18329724
GTGAGGTGCCTGGTCTGAGAGGGCCTTGACCATTCCCCTTG
  1. wrong.fasta: head >COSN17913384 CAAGAGGGGTGAATGTGTTTCGCATGCACAAGGGACAGGAG >COSN6473262 TCAGAGCTGGTGGGGTGGAGAGACAGAAACAAGTGGGAGAA >COSN17979053 TATACCTACCTTATAGATAAGGAAATTGAAGCTTATAGAGT >COSN15187127 TTTTTCCTTATGATACTCTATTGCCTCTCCATGGATAAAGA >COSN17118087 AACTCCTGACCTCAGGTGATCCACCTGCCTCGGCCTCCCAA tail: >COSN8613750|COSN18681804|COSN18352203|COSN5269362 GTGAGCATGGAGGGCCATGCACACCTGGACAGGGATGAGGG >COSN18636019 CTATGCCAGGACAGTGTAGCTGCCCCGTGGTGCTGACAAAT >COSN18618110 CAGCAGCTATAGGTCTGGGGTGGAGCCGCTTGGCAAGAACA >COSN18201496|COSN18701930|COSN18583086 AAGTCATTGCTGTCCTGTCCTGCCTGGGGCTTTTGTGGACC >COSN18653667 GTGAGGTGCCTGGTCTGAGATGGCCTTGACCATTCCCCTTG

As you can see in right.fasta, >COSN18329724 is the last header with its sequence, GTGAGG .... CCTTG but in wrong.fasta the sequence is the same but with wrong header. Infact in wrong.fasta, the header comes down a sequence. So, I investigated that what happens in wrong.fasta: grep -A 7 -B 4 COSN229024 right.fasta

COSN9627597 CGCTGGGCTCGCCTCCAGCCAGGCCTGCATTCCCAAATCTA COSN8175610 CAAGAGAGAAATTCTGACACTTCCTAAGTCTACCAAGCTTT COSN229024 CACTATAAAAATATTAAGAGAATGTCCTAATGAAGTGTGCA COSN18183003 CTGTACCTTGGAAATGTCTGCTGTTCGTAACTTCTTCAGTT COSN18487588 ATGCCTAGTTCTAATCATCTCATCCTGTGTTTGTGATTGAT COSN1681903|COSN1178783 TGCTTACCCCTTAAATGCAATTTATTTACTTTTACCACTGT

grep -A 7 -B 4 COSN229024 wrong.fasta

COSN9627597 CGCTGGGCTCGCCTCCAGCCTGGCCTGCATTCCCAAATCTA COSN8175610 CAAGAGAGAAATTCTGACACCTCCTAAGTCTACCAAGCTTT COSN229024 CACTATAAAAATATTAAGAGA COSN18183003 TGTGTTTGTGATTGATGT COSN18487588 TGCTTACCCCTTAAATGCAACTTATTTACTTTTACCACTGT COSN1681903|COSN1178783 CTTCCCAACTCATGAGTTCTGAATTCCAATACGTCTCCATT

I observed that in wrong.fasta approximately half of the sequence of >COSN229024 breaks up and forms a new sequence >COSN18183003. In the end, the whole order mess up after breakup. So, I want to put the break part of >COSN229024 back into the >COSN229024 in wrong.fasta.

I checked the both fasta files and I am sure this in the only problem. My desired output would be the same if I:

grep -A 7 -B 4 COSN229024 wrong.fasta

COSN9627597 CGCTGGGCTCGCCTCCAGCCAGGCCTGCATTCCCAAATCTA COSN8175610 CAAGAGAGAAATTCTGACACTTCCTAAGTCTACCAAGCTTT COSN229024 CACTATAAAAATATTAAGAGAATGTCCTAATGAAGTGTGCA COSN18183003 CTGTACCTTGGAAATGTCTGCTGTTCGTAACTTCTTCAGTT COSN18487588 ATGCCTAGTTCTAATCATCTCATCCTGTGTTTGTGATTGAT COSN1681903|COSN1178783 TGCTTACCCCTTAAATGCAATTTATTTACTTTTACCACTGT

Any help appreciated,

Thanks,

Waqas.

sequence fasta fasta sequence breaks • 1.3k views
ADD COMMENT
1
Entering edit mode

Instead of using "Block quotes" button use the "code" (101010 button) to format these sequences. It makes it very hard to read these the way they are now. I will fix one block for you as an example.

ADD REPLY
0
Entering edit mode

Hi Community,

I have two fasta files: 1. head right.fasta:

>COSN17913384
CAAGAGGGGTGAATGTGTTTTGCATGCACAAGGGACAGGAG
>COSN6473262
TCAGAGCTGGTGGGGTGGAGGGACAGAAACAAGTGGGAGAA
>COSN17979053
TATACCTACCTTATAGATAAAGAAATTGAAGCTTATAGAGT
>COSN15187127
TTTTTCCTTATGATACTCTACTGCCTCTCCATGGATAAAGA
>COSN17118087
AACTCCTGACCTCAGGTGATACACCTGCCTCGGCCTCCCAA

tail:

>COSN18618110
CTATGCCAGGACAGTGTAGCAGCCCCGTGGTGCTGACAAAT
>COSN18201496|COSN18701930|COSN18583086
CAGCAGCTATAGGTCTGGGGCGGGGCCGCTTGGCAAGAACA
>COSN18653667
AAGTCATTGCTGTCCTGTCCCGCCTGGGGCTTTTGTGGACC
>COSN18329724
GTGAGGTGCCTGGTCTGAGAGGGCCTTGACCATTCCCCTTG
  1. head wrong.fasta:

    COSN17913384 CAAGAGGGGTGAATGTGTTTCGCATGCACAAGGGACAGGAG COSN6473262 TCAGAGCTGGTGGGGTGGAGAGACAGAAACAAGTGGGAGAA COSN17979053 TATACCTACCTTATAGATAAGGAAATTGAAGCTTATAGAGT COSN15187127 TTTTTCCTTATGATACTCTATTGCCTCTCCATGGATAAAGA COSN17118087 AACTCCTGACCTCAGGTGATCCACCTGCCTCGGCCTCCCAA

tail:

>COSN8613750|COSN18681804|COSN18352203|COSN5269362
GTGAGCATGGAGGGCCATGCACACCTGGACAGGGATGAGGG
>COSN18636019
CTATGCCAGGACAGTGTAGCTGCCCCGTGGTGCTGACAAAT
>COSN18618110
CAGCAGCTATAGGTCTGGGGTGGAGCCGCTTGGCAAGAACA
>COSN18201496|COSN18701930|COSN18583086
AAGTCATTGCTGTCCTGTCCTGCCTGGGGCTTTTGTGGACC
>COSN18653667
GTGAGGTGCCTGGTCTGAGATGGCCTTGACCATTCCCCTTG

As you can see in right.fasta, >COSN18329724 is the last header with its sequence, GTGAGG .... CCTTG but in wrong.fasta the sequence is the same but with wrong header. Infact in wrong.fasta, the header comes down a sequence. So, I investigated that what happens in wrong.fasta:

grep -A 7 -B 4 COSN229024 right.fasta

>COSN9627597
CGCTGGGCTCGCCTCCAGCCAGGCCTGCATTCCCAAATCTA
>COSN8175610
CAAGAGAGAAATTCTGACACTTCCTAAGTCTACCAAGCTTT
>COSN229024
CACTATAAAAATATTAAGAGAATGTCCTAATGAAGTGTGCA
>COSN18183003
CTGTACCTTGGAAATGTCTGCTGTTCGTAACTTCTTCAGTT
>COSN18487588
ATGCCTAGTTCTAATCATCTCATCCTGTGTTTGTGATTGAT
>COSN1681903|COSN1178783
TGCTTACCCCTTAAATGCAATTTATTTACTTTTACCACTGT

grep -A 7 -B 4 COSN229024 wrong.fasta:

>COSN9627597
CGCTGGGCTCGCCTCCAGCCTGGCCTGCATTCCCAAATCTA
>COSN8175610
CAAGAGAGAAATTCTGACACCTCCTAAGTCTACCAAGCTTT
>COSN229024
CACTATAAAAATATTAAGAGA
>COSN18183003
TGTGTTTGTGATTGATGT
>COSN18487588
TGCTTACCCCTTAAATGCAACTTATTTACTTTTACCACTGT
>COSN1681903|COSN1178783
CTTCCCAACTCATGAGTTCTGAATTCCAATACGTCTCCATT

I observed that in wrong.fasta approximately half of the sequence of >COSN229024 breaks up and forms a new sequence >COSN18183003. In the end, the whole order mess up after breakup. So, I want to put the break part of >COSN229024 back into the >COSN229024 in wrong.fasta.

I checked the both fasta files and I am sure this in the only problem. My desired output would be the same if I: grep -A 7 -B 4 COSN229024 wrong.fasta

>COSN9627597
CGCTGGGCTCGCCTCCAGCCTGGCCTGCATTCCCAAATCTA
>COSN8175610
CAAGAGAGAAATTCTGACACCTCCTAAGTCTACCAAGCTTT
>COSN229024
CACTATAAAAATATTAAGAGATGTGTTTGTGATTGATGT
>COSN18183003
TGCTTACCCCTTAAATGCAACTTATTTACTTTTACCACTGT
>COSN18487588
CTTCCCAACTCATGAGTTCTGAATTCCAATACGTCTCCATT

Any help appreciated,

Thanks,

Waqas.

ADD REPLY
0
Entering edit mode

I would be more worried by how your error originated.

ADD REPLY

Login before adding your answer.

Traffic: 3063 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6