Question: sample mixing when demultiplexing
gravatar for lait
2.5 years ago by
lait150 wrote:

How is it possible to detect if samples from different humans are mixed while demiltiplexing? we have 4 samples per lane and 8 lanes in total. After demultiplexing, it turned out that some samples have double the size of other samples. The average size per sample is 10GB, but for our last run, what we got is samples with the following sizes:

  • 10GB (which is normal)
  • 5 GB
  • 15 GB

Which appears as if some reads from certain samples when demultiplexing where linked to the wrong sample.

I already have the fastq files, BAM files and VCF files.

How can I verify computationally that read mixing happened?

Edit: each sample is sequenced twice on two different lanes. So there is a sample-collection-across-different-lanes step after demultiplexing.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by lait150
gravatar for genomax
2.5 years ago by
United States
genomax90k wrote:

You can't conclude 'normality' of a sample based on the yield of the data. Libraries can behave differently and generate more or less data. Ideally this explanation is applicable and you just have unbalanced libraries in the pool. Errors in demultiplexing alone can't explain the huge differences you are observing.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by genomax90k

ok thanks.. I added an important point in the Edit section.. could the error have happened then in the next step after demultiplexing? which is collecting samples across different lanes? if so, then I would come back to my previous question, if it is applicable here to check if there were sample mixing?

ADD REPLYlink written 2.5 years ago by lait150

I assume this is Illumina sequencing? Are your samples part of different pools or was the same pool run on multiple lanes? If latter, it is simple to collect all reads belonging to one sample into single files by using --no-lane-splitting option with bcl2fastq. That way there can be no post-processing errors due to wrong sample merges. If former, it is still possible that the pools are quantitatively unbalanced to begin with and thus explain the yield differences.

Note: If you feel that two samples (with different indexes) were incorrectly mixed in post-processing step then isolate the index sequences and see if there are more than one in each file ( use the code here: C: Demultiplexing reads with index present in the labels )

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by genomax90k

thanks a lot. Using your script, I am sure now that there were no mix between the sequences. In this regard, do you have an explanation for the following:

I processed the vcf files, and calculated the b-allele frequency for the heteroz. mutations. when plotting the frequency graph, most of the plots (especially those related to the samples with unusual file size) appeared to have three peaks , one large peak at 0.5 and two smaller ones at 0.4 and 0.6. does this suggest contamination? or?

ADD REPLYlink written 2.5 years ago by lait150
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 956 users visited in the last hour