I asked same question before but did not get what I am looking for. do
you guys know how to solve this problem?
There are multiple issues in your post, perhaps the reason why you did not get attention / response. First thing first:
Reading carefully and understanding the error messages of any program / script is as important as running the program itself, if not more. It saves a lot of painful headaches later. I have seen multiple instances where people don't read the error messages and say that 'my program is not working'. Ok, but did you try to understand what the error messages say (at least a google)?
sed: -e expression #1, char 16: unterminated `s' command
The error message here says
unterminated s command at 16th charcater, implying that something is missing (not terminated). The sed command here is
"s" and there are 16 charaters that you are passing to this command. The sed syntax (see section 3.5 of sed manual) is
‘s/regexp/replacement/flags’, where the s command can be followed by zero or more of the following flags (ref. section 3.5 again ). There is a terminal
"/" missing in your command and your command should have been
sed -e 's/SN:GL456210.1//'
GNU sed has no option called
-v (see invocation). I am guessing that you wanted to use
grep -v, which makes more sense as you want to exclude the matches. If you are using other version of sed, please say so (might not be obvios to many of us)
Explain your problem clearly first, rather than giving a crude example of what you want to achieve. The positive side of this is that you will know exactly what you want, so does the community. The above problem could be framed as: "I would like remove all the lines of type "@SQ SN:GL456" from my BAM header. This I'm *only guessing from the information you have provided. Since the problem is not clearly stated, I have no way to tell what you exatly want (see next point).
Since you have used 'sed -v 's/SN:GL456210.1/, it will work only on pattern (rather the word) 'SN:GL456210.1'. You see the confusion: do you want to remove all lines with pattern "SN:GL456*" or just the line with above word?
Now coming to your actual Qs:
and when I tried -e instead of -v it gave this error:
"-e" is used for chaining multiple sed commands, if you want to run them in sequential order. It will not magically solve your problem. Please read the manual / help pages to know what each option does. If you have only one expression (which is the case here),
-e is not required at all.
How to achieve what I think you want to do?
The canonical grep way: The
-v flag of
grep inverts the matches, i.e., it gives all the lines not matching the pattern.
grep -v "@SQ SN:GL456" # or more precisely grep -v "^@SQ SN:GL456"
The sed way: you need to delete the matching line rather than substitute. The syntax is
sed '/pattern_to_match/d' input_file
sed '/@SQ SN:GL456/d'
Read the relevant manual / help pages.
Read carefully and try to understand the error messages (GIYF!)
Frame the problems clearly. If possible, give some clear examples.
Sorry for the long rant, which is absolutely not intended to demotivate you from future postings. These issues prevail so much that I wanted to make a general appeal to the comunity, and I am sorry again if I (ab)used your post in doing so. I sincerely hope that it will help the community (both posters and responders) to get engaged in a better, efficient and productive way. So please don't stop posting. We are always here to help you if you come prepared :)
PS: As a sides note, I hope that you are aware of the fact that if you remove the SQ-line only from header of the BAM file and if it contains any match to that reference sequence, the BAM will get corrupted.