In-silico downsizing to estimate the DNA input
1
0
Entering edit mode
20 months ago
APJ ▴ 40

Hi,

Given a fastq file from 50ng data, I could find all the reference variants from the variant calling results. Is it possible to test in silico downsizing of fastq data, to see what the minimal DNA amount would be to not lose our reference variants?

Any thoughts on this?

Thank you!

sequencing snp next-gen • 358 views
1
Entering edit mode

I guess all you can do is check how coverage differences change variant calls, but I doubt that you can meaningfully simulate different DNA amounts as this is dependent on the kit and the number of PCR cycles, so you would need data for different starting amounts and then make a model based on these data.

0
Entering edit mode
20 months ago
5heikki 11k

Why not?

You can do e.g. this:

paste -d $'\t' - - - - <file.fq | shuf -n "$NUMBER" | awk 'BEGIN{FS="\t";OFS="\n"}{print $1,$2,$3,$4}' > out.fq


Where "\$NUMBER" is the number of reads you want in your output. If you want the shuf to be deterministic or e.g. have the chance to including the same read more than once then see man shuf