Library, Sample Id Tag In Sam Format
2
1
Entering edit mode
11.9 years ago

We recently sequenced a specific mouse strain. The sequencing data was generated on the 5500 XL platform from the same mate-pair library from a single male mouse liver. We had our sequencing done on three flowchips with each using 6,6,3 lanes respectively and generated in total of 15 lanes of data.

I have few doubts regarding the different terminologies used such as sample, group id for my experiment.

My question is related to Readgroup ID (RG), Sample ID (SM) and Library (LB) tags in the SAM format. According to my understanding, the major organizational units for NGS analysis are lane < Library < Sample < Multiple-samples. In other words, multiple libraries (PE,SE or different insert sizes) can be made for the same sample and sequenced using 1 or more lanes. In our case, we have 1 sample (the mouse strain), 1 library (mate pair) and 15 lanes of data. I will align 15 lanes separately and provide a different readgroup ID (RG) tag to each lane as these lanes have beads with same bead ids.

Question: If I want to use mpileup (samtools) or GATK with multiple bam files should I use the same library and sample ID in the header for all the 15 bam files? The machine that has generated 21 lanes has given a different name to samples and library (may be the person who ran the machine did it) even though we used a single mouse and single library. Thats why I am confused.

Thanks a lot for your time.

sam format • 3.9k views
ADD COMMENT
0
Entering edit mode
11.9 years ago

I think you have understood the situation clearly. It sounds like your LB and SM tags should be the same for all lanes of data.

ADD COMMENT
0
Entering edit mode
11.9 years ago

Thanks Sean for the quick reply.

ADD COMMENT

Login before adding your answer.

Traffic: 1964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6