Split the sequence into each SNP entry
1
0
Entering edit mode
2.9 years ago
K ▴ 10

Hi,

I have a file from SNP data and would like to add

  1. from each sequence split the sequence into two with each SNP entry - this is bit tricky to do
  2. Add fasta symbol before each sequence start then. I know how to do this one using shell - sed 's/^\([^acgt]\)/>\1/'.

     AAGGGTTTAGAAAAAAACCAAACAAACAATCGAAA[C/T]GAAATAGAAAAAGAAAAAGGGAAGGGGTTAAGTTC
     TTCATATAAAAATTGATATAGAATCTTTGAAAAAG[A/C]CCTTTCTTCCTAAGAAAGAAAAGGCTTACTGTCTT
     CCCAAATAAACAGGTATGGAAGCTATAATTGGAAA[C/T]CACGATCGAATTTATGGAAGCATTGGTTTATACAT
     GGATCCAAAAGAAACTTGGGCATTTATTACTTGGA[C/T]GATATTCGGGATTTATTTACATACTCGAACAAATA
     TATCAGTTAGTCTACCATATTTTTTTCTTGACAGA[A/C]AACTAAGGAAATGGCTCCATGTGCTCTAATTCATT
     ACTAACTCTAAAGTAAAGGATCTTTCCACCTTTTC[G/T]GATCCCATACCAATAGCTTTTTTTGATTCGTCCAT
     AGTTTACACACTTTTGTATTACCTCTTCTTACTGC[C/T]GTATTTATGTTAATGCATTTCCTAATGATACGTAA
     AATAGATCTGACAAGTCGCACTATATGTCAACCCA[A/C]GATGGATGCTTGTCCCCGGGACTTCGATAAGGTAC
    

Thank you

sequence snp • 584 views
ADD COMMENT
2
Entering edit mode
2.9 years ago

like this?

$ awk -F "[][/]" -v OFS="\n" '{print ">seq_"NR"_"$2,$1$2$4,">seq_"NR"_"$3,$1$3$4}' test.txt 

>seq_1_C
AAGGGTTTAGAAAAAAACCAAACAAACAATCGAAACGAAATAGAAAAAGAAAAAGGGAAGGGGTTAAGTTC
>seq_1_T
AAGGGTTTAGAAAAAAACCAAACAAACAATCGAAATGAAATAGAAAAAGAAAAAGGGAAGGGGTTAAGTTC
>seq_2_A
TTCATATAAAAATTGATATAGAATCTTTGAAAAAGACCTTTCTTCCTAAGAAAGAAAAGGCTTACTGTCTT
>seq_2_C
TTCATATAAAAATTGATATAGAATCTTTGAAAAAGCCCTTTCTTCCTAAGAAAGAAAAGGCTTACTGTCTT
>seq_3_C
CCCAAATAAACAGGTATGGAAGCTATAATTGGAAACCACGATCGAATTTATGGAAGCATTGGTTTATACAT
>seq_3_T
CCCAAATAAACAGGTATGGAAGCTATAATTGGAAATCACGATCGAATTTATGGAAGCATTGGTTTATACAT
>seq_4_C
GGATCCAAAAGAAACTTGGGCATTTATTACTTGGACGATATTCGGGATTTATTTACATACTCGAACAAATA
>seq_4_T
GGATCCAAAAGAAACTTGGGCATTTATTACTTGGATGATATTCGGGATTTATTTACATACTCGAACAAATA
>seq_5_A
TATCAGTTAGTCTACCATATTTTTTTCTTGACAGAAAACTAAGGAAATGGCTCCATGTGCTCTAATTCATT
>seq_5_C
TATCAGTTAGTCTACCATATTTTTTTCTTGACAGACAACTAAGGAAATGGCTCCATGTGCTCTAATTCATT
>seq_6_G
ACTAACTCTAAAGTAAAGGATCTTTCCACCTTTTCGGATCCCATACCAATAGCTTTTTTTGATTCGTCCAT
>seq_6_T
ACTAACTCTAAAGTAAAGGATCTTTCCACCTTTTCTGATCCCATACCAATAGCTTTTTTTGATTCGTCCAT
>seq_7_C
AGTTTACACACTTTTGTATTACCTCTTCTTACTGCCGTATTTATGTTAATGCATTTCCTAATGATACGTAA
>seq_7_T
AGTTTACACACTTTTGTATTACCTCTTCTTACTGCTGTATTTATGTTAATGCATTTCCTAATGATACGTAA
>seq_8_A
AATAGATCTGACAAGTCGCACTATATGTCAACCCAAGATGGATGCTTGTCCCCGGGACTTCGATAAGGTAC
>seq_8_C
AATAGATCTGACAAGTCGCACTATATGTCAACCCACGATGGATGCTTGTCCCCGGGACTTCGATAAGGTAC
ADD COMMENT
0
Entering edit mode

thank you so much. I really appreciate your response. It is working great!!!

ADD REPLY

Login before adding your answer.

Traffic: 2454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6