I am trying to determine the genome sequence of a virus with a genome comprised of multiple RNA segments. All of the segments should posses the same short sequence (5-10 nt) on the 5' end and a different short sequence on the 3' end. I know the full sequence of several of the segments, including the conserved motifs at the ends, but I have no information about at least three segments besides the motifs that should be at the ends. I am attempting to build putative contigs for the unknown segments from a paired-end illumina library based on the motif sequence. I filtered my library to retain only reads that begin with the 5' motif using grep:
grep '^motif' -B 1 -A 2 --no-group-separator in.fastq > out.fastq
I would like to use these reads as seeds for contig extension from my original unfiltered library using something like PRICE, but I want extension to only occur from one end of the seed (i.e. build the contig out from the 3' end of the seed, but retain the 5' end of the seed as the 5' end of the contig). Is there a way to accomplish this?