I have two files with several hundred entries in each. File 1 has several 5 base seqeunces and file 2 has higher number of entries but with longer sequences. The first 5 bases of sequences in file 2 matches that of file 1. I tried some grep and awk methods , but did not work out for a partial match case as above. So for example:
ATGCC TTGCA GGAAC ........ ........
ATTTCGGGAAAATT ATGCCTTAAGACCT GGAACTAAGGGGA ............ ............
Any help is much appreciated ! Thanks !
Shenwei, thanks for the reply. But I already tried that grep option before posting the topic. It didn't work.
It definitely will work, but you have to put
^in front of the 5 letter sequences in
If you don't want to use grep then any program that will separate based on user-defined barcodes - flexbar / etc - will do this for you.