Question: bamUtils dedup and Library information from BAM header
gravatar for skhan
3 months ago by
skhan10 wrote:

I have 9 .bam files, produced from 2x75b PE Illumina reads (RNA-Seq) and aligned using STAR to the Ensemble rat reference genome. Each file has one @RG line with only two entries: ID and SM. So for sample s01, the @RG line looks as follows: @RG ID:s01 SM:s01. I have not included any library information (LB:) in the @RG line.

When I run bamUtil's dedup to mark duplicates, I get the following error for each of the 9 .bam files: WARNING: Cannot find library information in the header line @RG ID:s01 SM:s01 . Using empty string for library name

I'm a beginner here. As best as I can tell the duplication marking seems to have worked well.

Should I be concerned that the input .bam files did not have a library defined? If I need to define a library for each .bam file, could you point me to some insights on what to define as the library? e.g. Should I just set the library to the sample name, so that between the 9 .bam files I will have 9 different libraries?



ADD COMMENTlink modified 3 months ago by h.mon16k • written 3 months ago by skhan10
gravatar for h.mon
3 months ago by
h.mon16k wrote:

First: a warning is not an error. With an error, you would get no output, with a warning, you get output, but you may have to be careful and even discard it.

The duplication marking may have worked, but probably not optimally. The intention is to mark PCR and optical duplicates. PCR duplicates appear at library preparation step, optical duplicates form at clusterization step. I don't know the innards of bamUtil duplication marking, but it is likely it uses library information to mark PCR duplicates, so it should be important.

If you loaded each library is to be found only at a single lane, then as is well, but if you loaded the same library on several lanes or sequencing runs, then the marking of duplicates will be non-optimal.

Some background at Read Group In Sam/Bam Files: What Do They Exactly Describe? and Read Groups (GATK forums).

ADD COMMENTlink modified 3 months ago • written 3 months ago by h.mon16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 523 users visited in the last hour