Header of first sequence is missing after grep
1
0
Entering edit mode
3.4 years ago
Kai_Qi ▴ 130

Hi All:

I have a fasta file and I used grep to get all the sequences and the coordinates that contain a certain motif:

$ head input.fasta

>16:23107820-23108019(+)
GTACGGCGCTCCCGGGGCGGCCGGTGGCCTGTAGTCAAGGTCACTAGGACCCGCGTTGAGGTGGGTTGCTTGGCGGCCACACTGCAGGTATGCGGGCTTTTTCTTAGGGCACACACTTCTCCTTGTGCCCTTCGAGAAGCTTCCATGATGGTAAGACTCCAGATGTTGGGGAGACAGGACGGATACAAGAACGGAGTAT
>14:54909471-54909670(-)
GTAAGTGGCACCCTGCCAGAGATCCCTCTCTGCCCTGGGTCTCATGCCTTCCTTTCTGCACCTCCAGACAATTTCTGCTGCCCCTAGGTCCCAGATTTCAGCTGTCCAGATGTCCAGGCCTTTTAAAGGGTCTAGGCAGGGGGTCCTACTGCTCACACAGTCCTCCCACTGGCTGTTATGTTTAAAATCCTAACCTGGC
>7:127020805-127021004(-)
GTAGGTGTGGACGACAGACAGCTGGGTGGCATGAGAATGCAGGTGCCAGGCGAACTAGAGGGTGGTGCTGGGTGCGTCGTACCATCGGGAGAAGATCCCCTCCCCCTCAGCCTCTGCTGAAAGCAACAAGGGAACCCCTAAAAGAAGGGCTAAGAAGGTATGCACAAGATACTGGGTCTTCCCCAAGAATGGGGCTGGA
>X:20848619-20848818(+)
GTGAGGGCAGGCCCGGTAGGGTTCGGGTTTTGGAGCGGCTGCGGGACCCGGGTATGAAGTCCAGACCGAAAGCTCAGCTCCAAGATGCTTCCGTCTGAATCTCAGCGTTCTCCCGCCCGGAACCAAAGGAGTGGTTTGACCAGGGCGAGACCGTCGTCATCGACCGTGGGAGTGGATGGAGGAGTCGGCCTGCAGGCTG
>1:75547398-75547597(+)
GTGGGTAGCCTGGGGACCCCTAGCACCCCAGCCTTCACCACCATCACCTTCATCGCCACCATTACTGCGCTCACCTCCGGCTTGATCACTCAGTGTCATCCTGTGCTGGACGCTGTGCTGGGCCACCATGCCATGTTAAGTCATCCTGCCTCTCATACCATCATCACCTTGTTCACCTGTCAGGGGAGATGTAGGGGAG

$ grep -i -A1 CCC..C input.fasta > 1.fasta   #search motif CCC[A/U/G][C/G]C using grep since the sequence is 1 line, don't need to pre-process

The problem is that when I looked into the output file the title(coordinates) of the fist line is missing:

head 1.fasta 

GTACGGCGCTCCCGGGGCGGCCGGTGGCCTGTAGTCAAGGTCACTAGGACCCGCGTTGAGGTGGGTTGCTTGGCGGCCACACTGCAGGTATGCGGGCTTTTTCTTAGGGCACACACTTCTCCTTGTGCCCTTCGAGAAGCTTCCATGATGGTAAGACTCCAGATGTTGGGGAGACAGGACGGATACAAGAACGGAGTAT
>14:54909471-54909670(-)
GTAAGTGGCACCCTGCCAGAGATCCCTCTCTGCCCTGGGTCTCATGCCTTCCTTTCTGCACCTCCAGACAATTTCTGCTGCCCCTAGGTCCCAGATTTCAGCTGTCCAGATGTCCAGGCCTTTTAAAGGGTCTAGGCAGGGGGTCCTACTGCTCACACAGTCCTCCCACTGGCTGTTATGTTTAAAATCCTAACCTGGC
>7:127020805-127021004(-)
GTAGGTGTGGACGACAGACAGCTGGGTGGCATGAGAATGCAGGTGCCAGGCGAACTAGAGGGTGGTGCTGGGTGCGTCGTACCATCGGGAGAAGATCCCCTCCCCCTCAGCCTCTGCTGAAAGCAACAAGGGAACCCCTAAAAGAAGGGCTAAGAAGGTATGCACAAGATACTGGGTCTTCCCCAAGAATGGGGCTGGA
>X:20848619-20848818(+)
GTGAGGGCAGGCCCGGTAGGGTTCGGGTTTTGGAGCGGCTGCGGGACCCGGGTATGAAGTCCAGACCGAAAGCTCAGCTCCAAGATGCTTCCGTCTGAATCTCAGCGTTCTCCCGCCCGGAACCAAAGGAGTGGTTTGACCAGGGCGAGACCGTCGTCATCGACCGTGGGAGTGGATGGAGGAGTCGGCCTGCAGGCTG
>1:75547398-75547597(+)
GTGGGTAGCCTGGGGACCCCTAGCACCCCAGCCTTCACCACCATCACCTTCATCGCCACCATTACTGCGCTCACCTCCGGCTTGATCACTCAGTGTCATCCTGTGCTGGACGCTGTGCTGGGCCACCATGCCATGTTAAGTCATCCTGCCTCTCATACCATCATCACCTTGTTCACCTGTCAGGGGAGATGTAGGGGAG
>11:102777648-102777847(+)

What should I do to make the first line also have its title?

Thank you,

rna-seq RNA-Seq sequencing next-gen gene • 574 views
ADD COMMENT
2
Entering edit mode
3.4 years ago

you want B BEFORE

grep -i -B1 CCC..C input.fasta

not A AFTER

grep -i -A1 CCC..C input.fasta
ADD COMMENT
0
Entering edit mode

Thank you very much. It worked

ADD REPLY

Login before adding your answer.

Traffic: 2695 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6