Closed:How to extract sequences from multi fasta file, which contains specific primers?
2
0
Entering edit mode
4.1 years ago
k.kathirvel93 ▴ 300

Hi EveryOne,

I have a multifasta file which is converted from BWA bam file. I want to extract only sequences contains specific forward primer on the start and reverse primer at the end. How can i do it with awk or sed or grep. Thanks in advance.

The Input file looks like this :

>M01015:63:000000000-D2M18:1:1101:17027:1479
TTCTCTCTTCTCTCTTCTTCCTCTTTTCTTTTCTCTCTCTTTTTTTTCTTCTTTTTCTTCTTTTTTTCTT
TTCCTTTTTTTTTCTTTTTTTTTTTTTTTTTTTTTTTTTTCTCCTTTTTCCTTCTTTCTTTTTCTTTTTT
CTCCTCTTCTTTTTTCTTTTTTTTCTTTCTTTTTTTCTCCTTTTTTTTTTTTCTTTTTTTCTTCCCTTTT
TTTTTTTCCCTTTTTTCTTTTTTTTTTTCTTCCTTTTTTT

>M01015:63:000000000-D2M18:1:1101:17027:1479
TCCTCTCTCTCTCTTCTCCCTCCTCCCTTTCTCTCTTCTCTCTTTTCTCTTTCTTTTCTTTTTTCTCTTT
TCCCTTTTTCCCTTCTTTCTTTTCTTTTTTTTTTTCTTTTTTCTTTTCTTTTTTTTTTCTCTTTTTTTCT
TCTTTTTTTTTCTCCTCTTTTTTTTTTTCTTTTTTTTTTTTTTTCTCTTTTTCTTCTTTTTTTTTTTTTT
CTTTTTTTCTTTTTTTTTTTTTTTTTTCTCTCTCTTTTTT

>M01015:63:000000000-D2M18:1:1101:15901:1612
GGCACTCGTATCGATGCGGCCGCGTTCGTTTGTTTATACACCTGCTCGTGCTTGTTTATGCATCTGCCAT
CTCCCTTCTGCTTATTTCTGTCTCCGATGCCTCTGTACTCCTTAGCCTTTCAGCTCCTGCCGCCTGTTTC
CCTGTGATGCAACAAGCTTACTCTGCACCAATGATGCAGCAGCCAGCTCAATCTAACGCAGCCAGTGATT
AGTTAGACGCGTGCCTGTGATTAGTTAGACGCGTGCCAGT

>M01015:63:000000000-D2M18:1:1101:15901:1612
GCCTCTGTCCCTCTTCTACCTATTCCTTGCCCCCCTCTTCCTTATTCCTTCCCCGCCTCTTCCTTATCTC
TGCCTTCTTTCTTTTGACCTCTCTCCTTCCTCATTGGTGCAGCGTTAGCTTGTTGCTTCACTGGGAAACT
TGCGGCAGGAGCTGCCCTGCTTCTGCGTCCTGACTCTTCGCCTTCCGTAATTTCCCGTTCGGTGTTGCCT
GTTTCTTCTACCAGCTCGCTCAGTTTTTTATTCTTTCGA

>M01015:63:000000000-D2M18:1:1101:16395:1620
GGCACTCGTATCGATGCGGCCGCGGTTATCTCTTCCCGCTGCACTGCCTTTTAGGCGTTCTTTTGTTCCG
GCCCCCTCTCCCCCCGGGTTCCCTGCTTTCCCCTGTGCGCTATTCCTGTTCTAGATGCTTTACTGTCCCC
CTCCGCTCCCGGCTTCTCGGTCAGTTTCCCCGTGCTTAGTTAGACGCGTGCTTCTGGC

>M01015:63:000000000-D2M18:1:1101:16395:1620
GCCTCTAGCACGCGTCTAACTAATCACTTTCCCCCTCCCCGTTAATCCGGGTTCTGTCTTGTTCAGTCAT
TCCTCTCGCCCCGCCCTCGCTCACTGGCTCTTGCTGCCTACCCGGGTTCAGTACTCGCCGTCCCTTATGA
ACCCCTCTTTGGCCTTGCTCCGGGTGGTGTTTCCCGCGGCCGCATCGATACGAGTGCCCTGTTTCTTATA
CACTTCTGACGCTGCCGCCGAATATAGCGGTGTCGTTCTT

>M01015:63:000000000-D2M18:1:1101:15366:1643
GGCACTCGTATCGATGCGGCCGCGGTAAACTCCACCCGGACCAACGCCAAATAGTGTTTCATAAGGTACT
TCCCTTACTCCCCCCGTGTAGGCTGCTTTTGCCCCTCTTCTCTTGCTGGCCTAGATGAATTACTGTCCTC
TACCTAACCCTTCTTATCTGTCAGTTTCACCGTTTTTTGTTAGTCGCGTGCTCTTTGCCTTTTTCTTCTA
CCTCTCTCCTCTCTCACTATACTTCTGTCCATCTTTTTTT

>M01015:63:000000000-D2M18:1:1101:15366:1643
GCCTCTAGCACGCGTCTCACTAATCCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCTT
TCCTCTTTTCTTTCCTTTTCTCTCTTTTTTTTTCTCCTTTCCCTTTTTCTGTTCCTTCCGTCCCTTTTGT
TCCCCTTTTTTTCCTTTCTCCTTTTTTTTTTTTCCCTTGCTCTTTCTTTTCTCTTCTCCTTTTTCTTTTA
CTTATCTTCCGTTTCCTTCGTCTTTCTCTTTTTATATTTT

>M01015:63:000000000-D2M18:1:1101:17421:1643
GGCACTCGTATCGATGCGGCCGCGGTGATGTTAGTCGCGTGCCGTGTTTTGTTACACGCGTGCCAGTGAT
TAGTTAGACGCGTGCTAGAGGC

>M01015:63:000000000-D2M18:1:1101:17421:1643
GCCTCTAGCACGCGTCTAACTAATCACTGGCCCGCGTCTCTCTAATCTCGGCTCGCGTCTCACTTCCCCG
CGGCCGCATCGATACGAGTGCC

>M01015:63:000000000-D2M18:1:1101:16505:1648
GGCACTCGTATCGATGCGGCCGCGTGTGATTTCTTCGACTTGTCCTAGCGTCCTCTCTCTTATCTACTTC
TTCGACCCCTCTCGACTCCTTTTCATCTCCTATTCCCTTTTCTGCTTCCCTATATTCTCTTCTTTTTTCT
TTTTTTTTTTTTGCTTATTCTTCCTTATCACTTTTTTTTTTCTACTCTATGCTTCCTGTCTGTCTCGTTT
CTGCCTCGTTGGTTTATTTTTCCTGCCTCTTTCTTTTTTT

>M01015:63:000000000-D2M18:1:1101:16505:1648
GCCTCTAGCACGCGTCTAACTAATCACTCTCTTCCTTTTCTTTTCTTTTGCCTTGTCTCTTCTTCCCCTC
TCTTGCTTCCCTCTACTTCTTTTTTTTTTTTTTCTTCCGTCTCCTTCTTTTTTTCTTCTCTACTTTTTTT
TCTTCTTTTTTTTTCTTTCTCTTTTTTTCTTTCTTTTTTCTTTCTTTTTCTTCTTCTTTTTTTCTATTTT
CTTCTCTTCTACTCTCTTTTCTTTTTTCTTCTTTTTCTTT

>M01015:63:000000000-D2M18:1:1101:17397:1654
GGCACTCGTATCGATGCGGCCGCGGGTGATGTGATTAGTTATACGCGTGCTAGTGGC

>M01015:63:000000000-D2M18:1:1101:17397:1654
TCCTCTAGCACGCGTCTAACTAATCACATCACCCGCGGCCGCATCGATACGAGTGCC

I want to extract only sequences(With headers) contains "ggcactcgtatcgatgcggccgcg" sequnces at the beginning and "gtgattagttagacgcgtgctagaggc" at the end. like this

M01015:63:000000000-D2M18:1:1101:17421:1643 GGCACTCGTATCGATGCGGCCGCGGTGATGTTAGTCGCGTGCCGTGTTTTGTTACACGCGTGCCAGTGAT TAGTTAGACGCGTGCTAGAGGC

Assembly genome next-gen sequencing assembly • 238 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 2557 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6