NGS Sample contamination simulation
1
0
Entering edit mode
7.9 years ago
MAPK ★ 2.1k

I am trying to simulate sample contamination for different level of dilution for NGS samples. Suppose I have two bam files for SampleA and SampleB. I want to generate 5 contaminated samples at dilution of 10%, 20%, 30%,40% and 50% of those two samples. I understand that I should extract reads from one of the two bam files at the given dilution percentage and reassign to the other bam file, but I don't know exactly how to do this. Can someone please explain me the procedure? Thanks

NGS • 1.9k views
ADD COMMENT
2
Entering edit mode
7.9 years ago

I'm not sure at what level of complexity to lay out the procedures, so let me know if the following doesn't suffice.

  1. Generate a large amount of sequence from both samples A and B.
  2. Shuffle the order of both files, since typically the read generators generate reads in sorted order.
  3. Take the first 90% of one sample and concatenate on the first 10% of the other (assuming you generated equal numbers of reads.

You could do this using a random number generator too, but this simpler procedure will likely suffice. For shuffling reads, have a look here: Randomize Read Order In Multigbp Fastq File? Note that handling paired-end reads is a bit more complicated, though only due to the shuffling (you can find commands for that here as well).

ADD COMMENT
0
Entering edit mode

Thanks Devon. Could you please explain a bit more on point 2 (Shuffle the order of both files, since typically the read generators generate reads in sorted order). What would be the process of selecting the reads based on the chromosome position (or do I even need to consider the chromosome positions?)? Say I have read from chr2:220333-chr2:24444432 of SampleA and want to shuffle in SampleB, how can I do this in a right way?

ADD REPLY
0
Entering edit mode

You don't need to perform any selection. If you just want to look at a specific region then restrict the reads generated to only arise from that region (if nothing else, make a fasta file from only that region and generate reads from it).

ADD REPLY

Login before adding your answer.

Traffic: 3861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6