Question: A sequence is break to form new sequence
0
gravatar for waqasnayab
3.1 years ago by
waqasnayab180
Pakistan
waqasnayab180 wrote:

Hi Community,

I have two fasta files: 1. right.fasta: head

>COSN17913384
CAAGAGGGGTGAATGTGTTTTGCATGCACAAGGGACAGGAG
>COSN6473262
TCAGAGCTGGTGGGGTGGAGGGACAGAAACAAGTGGGAGAA
>COSN17979053
TATACCTACCTTATAGATAAAGAAATTGAAGCTTATAGAGT
>COSN15187127
TTTTTCCTTATGATACTCTACTGCCTCTCCATGGATAAAGA
>COSN17118087
AACTCCTGACCTCAGGTGATACACCTGCCTCGGCCTCCCAA

tail:

>COSN18618110
CTATGCCAGGACAGTGTAGCAGCCCCGTGGTGCTGACAAAT
>COSN18201496|COSN18701930|COSN18583086
CAGCAGCTATAGGTCTGGGGCGGGGCCGCTTGGCAAGAACA
>COSN18653667
AAGTCATTGCTGTCCTGTCCCGCCTGGGGCTTTTGTGGACC
>COSN18329724
GTGAGGTGCCTGGTCTGAGAGGGCCTTGACCATTCCCCTTG
  1. wrong.fasta: head >COSN17913384 CAAGAGGGGTGAATGTGTTTCGCATGCACAAGGGACAGGAG >COSN6473262 TCAGAGCTGGTGGGGTGGAGAGACAGAAACAAGTGGGAGAA >COSN17979053 TATACCTACCTTATAGATAAGGAAATTGAAGCTTATAGAGT >COSN15187127 TTTTTCCTTATGATACTCTATTGCCTCTCCATGGATAAAGA >COSN17118087 AACTCCTGACCTCAGGTGATCCACCTGCCTCGGCCTCCCAA tail: >COSN8613750|COSN18681804|COSN18352203|COSN5269362 GTGAGCATGGAGGGCCATGCACACCTGGACAGGGATGAGGG >COSN18636019 CTATGCCAGGACAGTGTAGCTGCCCCGTGGTGCTGACAAAT >COSN18618110 CAGCAGCTATAGGTCTGGGGTGGAGCCGCTTGGCAAGAACA >COSN18201496|COSN18701930|COSN18583086 AAGTCATTGCTGTCCTGTCCTGCCTGGGGCTTTTGTGGACC >COSN18653667 GTGAGGTGCCTGGTCTGAGATGGCCTTGACCATTCCCCTTG

As you can see in right.fasta, >COSN18329724 is the last header with its sequence, GTGAGG .... CCTTG but in wrong.fasta the sequence is the same but with wrong header. Infact in wrong.fasta, the header comes down a sequence. So, I investigated that what happens in wrong.fasta: grep -A 7 -B 4 COSN229024 right.fasta

COSN9627597 CGCTGGGCTCGCCTCCAGCCAGGCCTGCATTCCCAAATCTA COSN8175610 CAAGAGAGAAATTCTGACACTTCCTAAGTCTACCAAGCTTT COSN229024 CACTATAAAAATATTAAGAGAATGTCCTAATGAAGTGTGCA COSN18183003 CTGTACCTTGGAAATGTCTGCTGTTCGTAACTTCTTCAGTT COSN18487588 ATGCCTAGTTCTAATCATCTCATCCTGTGTTTGTGATTGAT COSN1681903|COSN1178783 TGCTTACCCCTTAAATGCAATTTATTTACTTTTACCACTGT

grep -A 7 -B 4 COSN229024 wrong.fasta

COSN9627597 CGCTGGGCTCGCCTCCAGCCTGGCCTGCATTCCCAAATCTA COSN8175610 CAAGAGAGAAATTCTGACACCTCCTAAGTCTACCAAGCTTT COSN229024 CACTATAAAAATATTAAGAGA COSN18183003 TGTGTTTGTGATTGATGT COSN18487588 TGCTTACCCCTTAAATGCAACTTATTTACTTTTACCACTGT COSN1681903|COSN1178783 CTTCCCAACTCATGAGTTCTGAATTCCAATACGTCTCCATT

I observed that in wrong.fasta approximately half of the sequence of >COSN229024 breaks up and forms a new sequence >COSN18183003. In the end, the whole order mess up after breakup. So, I want to put the break part of >COSN229024 back into the >COSN229024 in wrong.fasta.

I checked the both fasta files and I am sure this in the only problem. My desired output would be the same if I:

grep -A 7 -B 4 COSN229024 wrong.fasta

COSN9627597 CGCTGGGCTCGCCTCCAGCCAGGCCTGCATTCCCAAATCTA COSN8175610 CAAGAGAGAAATTCTGACACTTCCTAAGTCTACCAAGCTTT COSN229024 CACTATAAAAATATTAAGAGAATGTCCTAATGAAGTGTGCA COSN18183003 CTGTACCTTGGAAATGTCTGCTGTTCGTAACTTCTTCAGTT COSN18487588 ATGCCTAGTTCTAATCATCTCATCCTGTGTTTGTGATTGAT COSN1681903|COSN1178783 TGCTTACCCCTTAAATGCAATTTATTTACTTTTACCACTGT

Any help appreciated,

Thanks,

Waqas.

ADD COMMENTlink modified 3.1 years ago by genomax68k • written 3.1 years ago by waqasnayab180
1

Instead of using "Block quotes" button use the "code" (101010 button) to format these sequences. It makes it very hard to read these the way they are now. I will fix one block for you as an example.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by genomax68k

Hi Community,

I have two fasta files: 1. head right.fasta:

>COSN17913384
CAAGAGGGGTGAATGTGTTTTGCATGCACAAGGGACAGGAG
>COSN6473262
TCAGAGCTGGTGGGGTGGAGGGACAGAAACAAGTGGGAGAA
>COSN17979053
TATACCTACCTTATAGATAAAGAAATTGAAGCTTATAGAGT
>COSN15187127
TTTTTCCTTATGATACTCTACTGCCTCTCCATGGATAAAGA
>COSN17118087
AACTCCTGACCTCAGGTGATACACCTGCCTCGGCCTCCCAA

tail:

>COSN18618110
CTATGCCAGGACAGTGTAGCAGCCCCGTGGTGCTGACAAAT
>COSN18201496|COSN18701930|COSN18583086
CAGCAGCTATAGGTCTGGGGCGGGGCCGCTTGGCAAGAACA
>COSN18653667
AAGTCATTGCTGTCCTGTCCCGCCTGGGGCTTTTGTGGACC
>COSN18329724
GTGAGGTGCCTGGTCTGAGAGGGCCTTGACCATTCCCCTTG
  1. head wrong.fasta:

    COSN17913384 CAAGAGGGGTGAATGTGTTTCGCATGCACAAGGGACAGGAG COSN6473262 TCAGAGCTGGTGGGGTGGAGAGACAGAAACAAGTGGGAGAA COSN17979053 TATACCTACCTTATAGATAAGGAAATTGAAGCTTATAGAGT COSN15187127 TTTTTCCTTATGATACTCTATTGCCTCTCCATGGATAAAGA COSN17118087 AACTCCTGACCTCAGGTGATCCACCTGCCTCGGCCTCCCAA

tail:

>COSN8613750|COSN18681804|COSN18352203|COSN5269362
GTGAGCATGGAGGGCCATGCACACCTGGACAGGGATGAGGG
>COSN18636019
CTATGCCAGGACAGTGTAGCTGCCCCGTGGTGCTGACAAAT
>COSN18618110
CAGCAGCTATAGGTCTGGGGTGGAGCCGCTTGGCAAGAACA
>COSN18201496|COSN18701930|COSN18583086
AAGTCATTGCTGTCCTGTCCTGCCTGGGGCTTTTGTGGACC
>COSN18653667
GTGAGGTGCCTGGTCTGAGATGGCCTTGACCATTCCCCTTG

As you can see in right.fasta, >COSN18329724 is the last header with its sequence, GTGAGG .... CCTTG but in wrong.fasta the sequence is the same but with wrong header. Infact in wrong.fasta, the header comes down a sequence. So, I investigated that what happens in wrong.fasta:

grep -A 7 -B 4 COSN229024 right.fasta

>COSN9627597
CGCTGGGCTCGCCTCCAGCCAGGCCTGCATTCCCAAATCTA
>COSN8175610
CAAGAGAGAAATTCTGACACTTCCTAAGTCTACCAAGCTTT
>COSN229024
CACTATAAAAATATTAAGAGAATGTCCTAATGAAGTGTGCA
>COSN18183003
CTGTACCTTGGAAATGTCTGCTGTTCGTAACTTCTTCAGTT
>COSN18487588
ATGCCTAGTTCTAATCATCTCATCCTGTGTTTGTGATTGAT
>COSN1681903|COSN1178783
TGCTTACCCCTTAAATGCAATTTATTTACTTTTACCACTGT

grep -A 7 -B 4 COSN229024 wrong.fasta:

>COSN9627597
CGCTGGGCTCGCCTCCAGCCTGGCCTGCATTCCCAAATCTA
>COSN8175610
CAAGAGAGAAATTCTGACACCTCCTAAGTCTACCAAGCTTT
>COSN229024
CACTATAAAAATATTAAGAGA
>COSN18183003
TGTGTTTGTGATTGATGT
>COSN18487588
TGCTTACCCCTTAAATGCAACTTATTTACTTTTACCACTGT
>COSN1681903|COSN1178783
CTTCCCAACTCATGAGTTCTGAATTCCAATACGTCTCCATT

I observed that in wrong.fasta approximately half of the sequence of >COSN229024 breaks up and forms a new sequence >COSN18183003. In the end, the whole order mess up after breakup. So, I want to put the break part of >COSN229024 back into the >COSN229024 in wrong.fasta.

I checked the both fasta files and I am sure this in the only problem. My desired output would be the same if I: grep -A 7 -B 4 COSN229024 wrong.fasta

>COSN9627597
CGCTGGGCTCGCCTCCAGCCTGGCCTGCATTCCCAAATCTA
>COSN8175610
CAAGAGAGAAATTCTGACACCTCCTAAGTCTACCAAGCTTT
>COSN229024
CACTATAAAAATATTAAGAGATGTGTTTGTGATTGATGT
>COSN18183003
TGCTTACCCCTTAAATGCAACTTATTTACTTTTACCACTGT
>COSN18487588
CTTCCCAACTCATGAGTTCTGAATTCCAATACGTCTCCATT

Any help appreciated,

Thanks,

Waqas.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by waqasnayab180

I would be more worried by how your error originated.

ADD REPLYlink written 3.1 years ago by WouterDeCoster39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2253 users visited in the last hour