Entering edit mode
4.1 years ago
k.kathirvel93
▴
300
Hi EveryOne,
I have a multifasta file which is converted from BWA bam file. I want to extract only sequences contains specific forward primer on the start and reverse primer at the end. How can i do it with awk or sed or grep. Thanks in advance.
The Input file looks like this :
>M01015:63:000000000-D2M18:1:1101:17027:1479
TTCTCTCTTCTCTCTTCTTCCTCTTTTCTTTTCTCTCTCTTTTTTTTCTTCTTTTTCTTCTTTTTTTCTT
TTCCTTTTTTTTTCTTTTTTTTTTTTTTTTTTTTTTTTTTCTCCTTTTTCCTTCTTTCTTTTTCTTTTTT
CTCCTCTTCTTTTTTCTTTTTTTTCTTTCTTTTTTTCTCCTTTTTTTTTTTTCTTTTTTTCTTCCCTTTT
TTTTTTTCCCTTTTTTCTTTTTTTTTTTCTTCCTTTTTTT
>M01015:63:000000000-D2M18:1:1101:17027:1479
TCCTCTCTCTCTCTTCTCCCTCCTCCCTTTCTCTCTTCTCTCTTTTCTCTTTCTTTTCTTTTTTCTCTTT
TCCCTTTTTCCCTTCTTTCTTTTCTTTTTTTTTTTCTTTTTTCTTTTCTTTTTTTTTTCTCTTTTTTTCT
TCTTTTTTTTTCTCCTCTTTTTTTTTTTCTTTTTTTTTTTTTTTCTCTTTTTCTTCTTTTTTTTTTTTTT
CTTTTTTTCTTTTTTTTTTTTTTTTTTCTCTCTCTTTTTT
>M01015:63:000000000-D2M18:1:1101:15901:1612
GGCACTCGTATCGATGCGGCCGCGTTCGTTTGTTTATACACCTGCTCGTGCTTGTTTATGCATCTGCCAT
CTCCCTTCTGCTTATTTCTGTCTCCGATGCCTCTGTACTCCTTAGCCTTTCAGCTCCTGCCGCCTGTTTC
CCTGTGATGCAACAAGCTTACTCTGCACCAATGATGCAGCAGCCAGCTCAATCTAACGCAGCCAGTGATT
AGTTAGACGCGTGCCTGTGATTAGTTAGACGCGTGCCAGT
>M01015:63:000000000-D2M18:1:1101:15901:1612
GCCTCTGTCCCTCTTCTACCTATTCCTTGCCCCCCTCTTCCTTATTCCTTCCCCGCCTCTTCCTTATCTC
TGCCTTCTTTCTTTTGACCTCTCTCCTTCCTCATTGGTGCAGCGTTAGCTTGTTGCTTCACTGGGAAACT
TGCGGCAGGAGCTGCCCTGCTTCTGCGTCCTGACTCTTCGCCTTCCGTAATTTCCCGTTCGGTGTTGCCT
GTTTCTTCTACCAGCTCGCTCAGTTTTTTATTCTTTCGA
>M01015:63:000000000-D2M18:1:1101:16395:1620
GGCACTCGTATCGATGCGGCCGCGGTTATCTCTTCCCGCTGCACTGCCTTTTAGGCGTTCTTTTGTTCCG
GCCCCCTCTCCCCCCGGGTTCCCTGCTTTCCCCTGTGCGCTATTCCTGTTCTAGATGCTTTACTGTCCCC
CTCCGCTCCCGGCTTCTCGGTCAGTTTCCCCGTGCTTAGTTAGACGCGTGCTTCTGGC
>M01015:63:000000000-D2M18:1:1101:16395:1620
GCCTCTAGCACGCGTCTAACTAATCACTTTCCCCCTCCCCGTTAATCCGGGTTCTGTCTTGTTCAGTCAT
TCCTCTCGCCCCGCCCTCGCTCACTGGCTCTTGCTGCCTACCCGGGTTCAGTACTCGCCGTCCCTTATGA
ACCCCTCTTTGGCCTTGCTCCGGGTGGTGTTTCCCGCGGCCGCATCGATACGAGTGCCCTGTTTCTTATA
CACTTCTGACGCTGCCGCCGAATATAGCGGTGTCGTTCTT
>M01015:63:000000000-D2M18:1:1101:15366:1643
GGCACTCGTATCGATGCGGCCGCGGTAAACTCCACCCGGACCAACGCCAAATAGTGTTTCATAAGGTACT
TCCCTTACTCCCCCCGTGTAGGCTGCTTTTGCCCCTCTTCTCTTGCTGGCCTAGATGAATTACTGTCCTC
TACCTAACCCTTCTTATCTGTCAGTTTCACCGTTTTTTGTTAGTCGCGTGCTCTTTGCCTTTTTCTTCTA
CCTCTCTCCTCTCTCACTATACTTCTGTCCATCTTTTTTT
>M01015:63:000000000-D2M18:1:1101:15366:1643
GCCTCTAGCACGCGTCTCACTAATCCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCTT
TCCTCTTTTCTTTCCTTTTCTCTCTTTTTTTTTCTCCTTTCCCTTTTTCTGTTCCTTCCGTCCCTTTTGT
TCCCCTTTTTTTCCTTTCTCCTTTTTTTTTTTTCCCTTGCTCTTTCTTTTCTCTTCTCCTTTTTCTTTTA
CTTATCTTCCGTTTCCTTCGTCTTTCTCTTTTTATATTTT
>M01015:63:000000000-D2M18:1:1101:17421:1643
GGCACTCGTATCGATGCGGCCGCGGTGATGTTAGTCGCGTGCCGTGTTTTGTTACACGCGTGCCAGTGAT
TAGTTAGACGCGTGCTAGAGGC
>M01015:63:000000000-D2M18:1:1101:17421:1643
GCCTCTAGCACGCGTCTAACTAATCACTGGCCCGCGTCTCTCTAATCTCGGCTCGCGTCTCACTTCCCCG
CGGCCGCATCGATACGAGTGCC
>M01015:63:000000000-D2M18:1:1101:16505:1648
GGCACTCGTATCGATGCGGCCGCGTGTGATTTCTTCGACTTGTCCTAGCGTCCTCTCTCTTATCTACTTC
TTCGACCCCTCTCGACTCCTTTTCATCTCCTATTCCCTTTTCTGCTTCCCTATATTCTCTTCTTTTTTCT
TTTTTTTTTTTTGCTTATTCTTCCTTATCACTTTTTTTTTTCTACTCTATGCTTCCTGTCTGTCTCGTTT
CTGCCTCGTTGGTTTATTTTTCCTGCCTCTTTCTTTTTTT
>M01015:63:000000000-D2M18:1:1101:16505:1648
GCCTCTAGCACGCGTCTAACTAATCACTCTCTTCCTTTTCTTTTCTTTTGCCTTGTCTCTTCTTCCCCTC
TCTTGCTTCCCTCTACTTCTTTTTTTTTTTTTTCTTCCGTCTCCTTCTTTTTTTCTTCTCTACTTTTTTT
TCTTCTTTTTTTTTCTTTCTCTTTTTTTCTTTCTTTTTTCTTTCTTTTTCTTCTTCTTTTTTTCTATTTT
CTTCTCTTCTACTCTCTTTTCTTTTTTCTTCTTTTTCTTT
>M01015:63:000000000-D2M18:1:1101:17397:1654
GGCACTCGTATCGATGCGGCCGCGGGTGATGTGATTAGTTATACGCGTGCTAGTGGC
>M01015:63:000000000-D2M18:1:1101:17397:1654
TCCTCTAGCACGCGTCTAACTAATCACATCACCCGCGGCCGCATCGATACGAGTGCC
I want to extract only sequences(With headers) contains "ggcactcgtatcgatgcggccgcg" sequnces at the beginning and "gtgattagttagacgcgtgctagaggc" at the end. like this
M01015:63:000000000-D2M18:1:1101:17421:1643 GGCACTCGTATCGATGCGGCCGCGGTGATGTTAGTCGCGTGCCGTGTTTTGTTACACGCGTGCCAGTGAT TAGTTAGACGCGTGCTAGAGGC