Question: What is a read group?
1
gravatar for crisagazzola
2.2 years ago by
crisagazzola10
crisagazzola10 wrote:

Could someone explain to me like I'm 5 years old what a read group is? I've read several definitions of it. For example "A read group is the set of reads that were generated from a single run of a sequencing instrument". So in this definition, is the set of reads the same thing as the set of all the base pair sequence segments that are generated after the DNA has been ran through the sequencing machine? Are the "set of reads" the ones that are contained in the fastq file?

I've read other definitions that use the terms "lane", and "flow cell". I've looked up these terms as well but still don't understand what the read group is referring to. I think I've spotted it in some .fastq files. I'm a software developer with no background in bioinformatics that has been playing around with the Picard tools, and for some of the tools, you must pass a read group as an argument. I want to make sure I understand what I'm passing in, and what it does. Thank you.

ADD COMMENTlink modified 2.2 years ago by Devon Ryan97k • written 2.2 years ago by crisagazzola10

Which Picard tools are you trying to use?

This page has some good discussion of read group.

ADD REPLYlink written 2.2 years ago by goodez480

I've been using FastqToSam which takes in a read group as an argument

ADD REPLYlink written 2.2 years ago by crisagazzola10

I guess I'm just looking for some confirmation on the meaning of the basic terminology. For example, in the page you provide a link to, it's stated that "There is no formal definition of what is a read group, but in practice, this term refers to a set of reads that were generated from a single run of a sequencing instrument".

So are the "set of reads" referring to the same strings found in the FASTQ or SAM file, which describe a segment of DNA? For example, "ACTTTAGAAATTTACTTTTA". Is that a "read"? And is the entire set of them found in a FASTQ file, the "read group"?

ADD REPLYlink written 2.2 years ago by crisagazzola10

Past thread of interest:
Read Group In Sam/Bam Files: What Do They Exactly Describe?

ADD REPLYlink written 2.2 years ago by genomax91k

Always nice to see non-wet lab people going to the effort of really understanding the process! :) +1

ADD REPLYlink written 2.2 years ago by Joe18k
2
gravatar for Devon Ryan
2.2 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

is the set of reads the same thing as the set of all the base pair sequence segments that are generated after the DNA has been ran through the sequencing machine?

Bases, not "base pairs", but yes.

Are the "set of reads" the ones that are contained in the fastq file?

Yes

More generally, a "read group" is a set of sequences (in one or more fastq files) having a common set of metadata. This metadata generally includes patient/sample ID, library ID (the library is the preparation of the patient/sample DNA that's actually sequenced and there can be more than one library made per patient/sample) and flow cell.

A "flow cell" is the physical device (it's a partially hollow glass slide) on the sequencer where the sequencing actually takes place. These are typically single-use. The flow cell is always a component of the read group, since it can represent a batch effect that downstream software may need to deal with (e.g., the software may be written to model some sort of sequencing bias on a per-flowcell basis). Flow cells themselves are comprised of 1 or more lanes, which quite literally are lanes through the flow cell in which DNA and fluids flow. Theoretically one could conceive of lane-specific biases that software could be written to handle. In practice this isn't really an issue (for that reason, fastq files commonly contain sequence from multiple lanes), but you'll still see references to lane-effects in software that was written a number of years ago.

ADD COMMENTlink written 2.2 years ago by Devon Ryan97k

Thank you for your answer, it's cleared everything up for me as far as what a read group is. I appreciate it!

ADD REPLYlink written 2.2 years ago by crisagazzola10

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLYlink written 2.2 years ago by genomax91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1020 users visited in the last hour