Hi, I was wondering if somebody can give me more details about read groups (RG) and how programs (picard, samtool) use them.
What does a read group exactly describe? When should I make a different read group?
As far as I understand, e read group should be assigend to a certain library, sequenced in a certain moment. That means that two libraries will always have different RG ID but the same library could have several different RG. Is that right? Anything else should trigger a creation of a new RG?
When programs use RG information (such as Picard when marking reads duplicate) do they actually compare the libraries? So If I merge files with, let's say, 3 different libraries and 10 different RG ID and then mark duplicates, do they look for duplicates only within a given library? Or across all reads? or only within a given RG ID?
If you have link to sites with some detailed explanation, that would be very useful.