Question: extract a DNA sequence based on header in multifasta file
0
gravatar for Optimist
4 weeks ago by
Optimist100
Optimist100 wrote:

Hello all,

My multifasta file with rRNA sequences looks like this.

>16S_rRNA::1:4-522(-)
TGCCTTCGGGAACTCTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGCGTGATGGCGGGAACTCAAAGGAGACTGCCGGTGATAAACCGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGAGTAGGGCTACACACGTGCTACAATGGCGTATACAAAGGGAAGCGACCCCGCGAGGGCAAGCGGAACTCATAAAGTACGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTAGATCAGAATGCTACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTCCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTT
>16S_rRNA::0-508(-)
TTGACGTTACCGACAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTGATTGAGTCAGATGTGAAATCCCCGGGCTTAACCCGGGAATTGCATCTGATACTGGTCAGCTAGAGTCTTGTAGAGGGGGGTAGAATTCCATGTGTAGCGGTGAAATGCGTAGAGATGTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACAAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCTGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCT
>5S_rRNA::1-80(-)
TGGCGGCCGTAGCGCGGTGGTCCCACCTGACCCCATGCCGAACTCAGAAGTGAAACGCCGTAGCGCCGATGGTAGTGTG
>5S_rRNA::1-77(-)
TGGCGGCCGTAGCGCGGTGGTCCCACCTGACCCCATGCCGAACTCAGAAGTGAAACGCCGTAGCGCCGATGGTAGT
>5S_rRNA::1:4731-4814(-)
TGACCCCATGCCGAACTCAGAAGTGAAACGCCGTAGCGCCGATGGTAGTGTGGGGTCTCCCCATGTGAGAGTAGGGAACCGCC

I want to extract only 16S_rRNA sequences into a new file. Is there a way to do this and I have to apply this on a large dataset of 500 files.

Thanking You

sequence extraction fasta • 110 views
ADD COMMENTlink modified 4 weeks ago by shenwei3565.6k • written 4 weeks ago by Optimist100
1
gravatar for cpad0112
4 weeks ago by
cpad011214k
Hyderabad India
cpad011214k wrote:

Try:

$ awk '$0 ~ /^>16S_rRNA/ {getline seq; print $0,seq}' test.fa
$ sed -n '/^>16S_rRNA/{N;p}' test.fa

don't use sed -i

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by cpad011214k

Thanks for the trick. awk has worked for me.

ADD REPLYlink written 4 weeks ago by Optimist100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1963 users visited in the last hour