Hi everyone my sample Depth of coverage around 300x now i want to make 250x, 200x, 150x, 100x . Can any one suggest some tools or packages to do such work ?
Thank you advance.
Hi HG. What format are your data in? At what step of your analysis do you want to reduce your coverage? Give us more details so that we can better help you.
HI Eric Thanks for reply. My data set : illumina 250bp pair end reads, whole genome sequencing of E.coil, Exp genome size 5.00mb. Now i already assemble the raw data which is around 300x coverage. Now i want to see if coverage reduce what will be quality of assembly mainly N50 value.
For more information i just want to follow my assembly like GAGE-B http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3702249/
In this paper they assemble their data in different coverage. I just want to see the same effect of my own data set .
If your reads are in SAM/BAM format, you can also the Picard DownsampleSam tool. You then provide a probability of a read being retained during sampling.
You can use seqtk:
something like this to select 1000000 reads (you will need to calculate how many reads would be needed for 250x etc):
seqtk sample -s100 .my.fastq.gz 1000000 | gzip > my.1.fastq.gz
Thank you so much for your suggestion
GATK also allows downsampling. in fact it does it always by default. you can use the PrintReads walker if downsampling is your only goal.
as Mick has always stated: "you can't always get what you want, but if you try the '-ds' option, well you just might find, you get what you need"
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy