How to obtain this information only through fastq data
How to obtain this information only through fastq data
Is this a tricky bonus question from a teacher? :) Happy to oblige.
Assuming it's DNA data, align and perform germline variant calling to generate a gVCF. Then use tools like Contest that can check for cross-sample contamination. Basically they're looking for lots of heterozygous SNPs that deviate from the expected 50:50 read support. Or an enrichment of triallelic sites which can only happen when two or more people's DNA got mixed up.
To figure out whether it's a tumor, you can run a CNV caller - most tumors have some degree of aneuploidy i.e. segments of the genome that are not diploid. And/or look for the very recurrent oncogenic hotspots like KRAS G12 or TP53 LoF mutations.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.