I want to use grep command to search a string of 6 characters in every line of a fasta file. In particular I need to search these 6 characters in the first 30 characters of each line.
I report you an example: String to search: GTGTCA
HWI-ST740:1:C2GCJACXX:1:1101:1279:1825 NGACGCTCTGACCTTGGGGCTGGTCGGGGATGCTGAGGAGACGGTGACCAGGGTTCCCTGGCCCCACANNNCCAAGCTTCCNNNNNNNNNNNNNNNNNNN HWI-ST740:1:C2GCJACXX:1:1101:1349:1847 NTTAGATGAGGGAAACATCTGCATCAAGTTGTTATCTGTGACAACAAGTGTTGTTCCACTGCCAAAGAGTTTCTTATAATAAAACAATCGGGGTGGCACNNNNNN
I want that the research is done only in the bold characters. So what I have to add in grep command to put the limit of 30 characters? Thank you
Did you tried any thing first?
"just suggestion- there much better and faster solutions "
Ok thank you very much! But If I want to recreate a fasta file with all the reads that contain the string, I have to add -B command, right? For example:
grep -B1 -i "GTGTCA" < my file | cut -c-30 > out.fa
is it right?
-B1option will print 1 line before each match. but you're not limiting grep to the first 30 characters as you were stating. that command with search in the whole file looking to entire lines, returning lines that matched plus previous ones, and then cut the first 30 characters of those results. doesn't make sense to me.
I'm reluctant to provide an answer, but
cut -c1-30will output characters 1-30 of each line in the input file.
yes you are right
will do what you suggested
that's not exactly a fasta file. fasta format requires a ">" character at the begging of the each tag line.
The sequence header looks like it was formatted from fastq, but is missing the '>'.