Entering edit mode
3.5 years ago
jones.theo194
•
0
I have multiple files, which originally had lots of sequences and headers in. I managed to remove the sequences to leave multiple headers only in each file.
So the file looks like this
Ncov - date_date_ - xxx| info I want |xxx - blah
Ncov - date_date_- xxx| info I want |xxx - blah
I would then like to save these new lines of information in one column as a new file, so I can easily compare all these files for % similarity - to give a much lower memory drag. However, I can't quite get my programme to run this properly or efficiently
I am wondering how I would then code for this extraction?
Hi, Can you show the steps that you have already done?
If you want to extract a pattern itself:
grep -oP "pattern" $file
If you want to extract sequences with a pattern in the header:
grep -A 1 ">.*pattern" $file
This only works if each of your sequences is one line. You can use this command to convert multi-line fasta to one-line fasta