Question: error in removing some lines in bam file
0
gravatar for ashkan
2.5 years ago by
ashkan110
ashkan110 wrote:

I am trying to remove some lines in my binary file(.bam) and make a new one like small example. I tried sed command but did not work and gave nothing. what I have is like this(when I print it using this command : samtools view -H myfile.bam)

@SQ SN:5    LN:151834684
@SQ SN:6    LN:149736546
@SQ SN:7    LN:145441459
@SQ SN:8    LN:129401213
@SQ SN:9    LN:124595110
@SQ SN:GL456210.1   LN:169725
@SQ SN:GL456211.1   LN:241735
@SQ SN:GL456212.1   LN:153618
@SQ SN:GL456213.1   LN:39340
@SQ SN:GL456216.1   LN:66673
@SQ SN:MT   LN:16299
@SQ SN:X    LN:171031299

and what I want is like this:

@SQ SN:5    LN:151834684
@SQ SN:6    LN:149736546
@SQ SN:7    LN:145441459
@SQ SN:8    LN:129401213
@SQ SN:9    LN:124595110
@SQ SN:MT   LN:16299
@SQ SN:X    LN:171031299

here is the command I used:

samtools view -H accepted_hits.bam | sed -v 's/SN:GL456210.1/' |
samtools reheader - accepted_hits.bam > mybamfile.reheadered.bam

and when I tried -e instead of -v it gave this error:

sed: -e expression #1, char 16: unterminated `s' command

I asked same question before but did not get what I am looking for. do you guys know how to solve this problem?

alignment • 1.7k views
ADD COMMENTlink modified 2.5 years ago by martinluth1230 • written 2.5 years ago by ashkan110

It is indeed an unterminated s command. s command structure is s/WhatTorRplace/ByWhat/. You miss a separator and a field. My sed installation doesn't have a -v flag so I'm not sure what that should do. But grep has, as Medhat told you.

Also, why do you want to modify the header? What purpose does that have?

ADD REPLYlink written 2.5 years ago by WouterDeCoster39k
4
gravatar for Santosh Anand
2.5 years ago by
Santosh Anand4.7k
Santosh Anand4.7k wrote:

I asked same question before but did not get what I am looking for. do you guys know how to solve this problem?

There are multiple issues in your post, perhaps the reason why you did not get attention / response. First thing first:

  1. Reading carefully and understanding the error messages of any program / script is as important as running the program itself, if not more. It saves a lot of painful headaches later. I have seen multiple instances where people don't read the error messages and say that 'my program is not working'. Ok, but did you try to understand what the error messages say (at least a google)?

    sed: -e expression #1, char 16: unterminated `s' command

    The error message here says unterminated s command at 16th charcater, implying that something is missing (not terminated). The sed command here is "s" and there are 16 charaters that you are passing to this command. The sed syntax (see section 3.5 of sed manual) is ‘s/regexp/replacement/flags’, where the s command can be followed by zero or more of the following flags (ref. section 3.5 again ). There is a terminal "/" missing in your command and your command should have been sed -e 's/SN:GL456210.1//'

  2. GNU sed has no option called -v (see invocation). I am guessing that you wanted to use grep -v, which makes more sense as you want to exclude the matches. If you are using other version of sed, please say so (might not be obvios to many of us)

  3. Explain your problem clearly first, rather than giving a crude example of what you want to achieve. The positive side of this is that you will know exactly what you want, so does the community. The above problem could be framed as: "I would like remove all the lines of type "@SQ SN:GL456" from my BAM header. This I'm *only guessing from the information you have provided. Since the problem is not clearly stated, I have no way to tell what you exatly want (see next point).

  4. Since you have used 'sed -v 's/SN:GL456210.1/, it will work only on pattern (rather the word) 'SN:GL456210.1'. You see the confusion: do you want to remove all lines with pattern "SN:GL456*" or just the line with above word?

Now coming to your actual Qs:

 and when I tried -e instead of -v it gave this error:

"-e" is used for chaining multiple sed commands, if you want to run them in sequential order. It will not magically solve your problem. Please read the manual / help pages to know what each option does. If you have only one expression (which is the case here), -e is not required at all.

How to achieve what I think you want to do?

The canonical grep way: The -v flag of grep inverts the matches, i.e., it gives all the lines not matching the pattern.

grep -v "@SQ SN:GL456"  # or more precisely grep -v "^@SQ SN:GL456"

The sed way: you need to delete the matching line rather than substitute. The syntax is sed '/pattern_to_match/d' input_file

sed  '/@SQ SN:GL456/d'

To summarize:

  1. Read the relevant manual / help pages.

  2. Read carefully and try to understand the error messages (GIYF!)

  3. Frame the problems clearly. If possible, give some clear examples.

Sorry for the long rant, which is absolutely not intended to demotivate you from future postings. These issues prevail so much that I wanted to make a general appeal to the comunity, and I am sorry again if I (ab)used your post in doing so. I sincerely hope that it will help the community (both posters and responders) to get engaged in a better, efficient and productive way. So please don't stop posting. We are always here to help you if you come prepared :)

PS: As a sides note, I hope that you are aware of the fact that if you remove the SQ-line only from header of the BAM file and if it contains any match to that reference sequence, the BAM will get corrupted.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Santosh Anand4.7k
2

I hope that you are aware of the fact that if you remove the SQ-line only from header of the BAM file and if it contains any match to that reference sequence, the BAM will get corrupted.

So why is everyone suggesting an answer to the original question? I doubt this is the end result OP wants. This critical piece of information should be at the top of your post as a warning :)

ADD REPLYlink written 2.5 years ago by genomax67k

Agreed, this isn't a side note it's the main note! There is no logical reason to remove stuff from the header if it's not in the rest of the file. Saves you only a couple of bytes. Less bytes than the number of bytes typed into this thread.

ADD REPLYlink written 2.5 years ago by John12k

This critical piece of information should be at the top of your post as a warning :)

slow poison; death comes at the end. I could be mean sometimes ;-)

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Santosh Anand4.7k

very didactic indeed

ADD REPLYlink written 2.5 years ago by Jorge Amigo11k

Good long rant, thanks for your time :p

ADD REPLYlink written 2.5 years ago by WouterDeCoster39k
1
gravatar for Medhat
2.5 years ago by
Medhat8.3k
Texas
Medhat8.3k wrote:

I think instead of sed -v 's/SN:GL456210.1/' you should use grep -v '@SQ SN:GL45621.*'

it will give you this header

@SQ SN:5    LN:151834684  
@SQ SN:6    LN:149736546  
@SQ SN:7    LN:145441459  
@SQ SN:8    LN:129401213  
@SQ SN:9    LN:124595110  
@SQ SN:MT   LN:16299  
@SQ SN:X    LN:171031299
ADD COMMENTlink written 2.5 years ago by Medhat8.3k

I use that. the resulting file is exactly the same as original one. I think because grep only look for the pattern!

ADD REPLYlink written 2.5 years ago by ashkan110
2

you can make sure of the pattern by building it appropriately:

samtools view -H accepted_hits.bam \
| grep -v '^@SQ\sSN:GL' \
| samtools reheader - accepted_hits.bam \
> mybamfile.reheadered.bam
ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Jorge Amigo11k

I used this command with the header you gave and it gives the result above

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Medhat8.3k

Dear Ashkan Hi,

I have used the script of dear @Medhat and it worked correctly (according to your example).

Maybe there is some error in line break or space instead of tab or vice versa, in your original file but not in your example provided here.

~ Take care

ADD REPLYlink written 2.5 years ago by Farbod3.2k
0
gravatar for John
2.5 years ago by
John12k
Germany
John12k wrote:
  1. samtools view -H original.bam > my_header.sam
  2. Open that sam file up in a text editor and delete the lines you don't want.
  3. samtools reheader my_header.sam original.bam > new.bam

This will probably end up giving you a bam file with reads that map to contigs like GL456216.1, but the header doesn't contain any information about GL456216.1 -- this will break everything. samtools should really have a -f parameter to force modification of the header, but by default read the whole file and check to make sure you're not about to waste thousands of euros in time and effort. </rant>

ADD COMMENTlink written 2.5 years ago by John12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1493 users visited in the last hour