Hi,
I recently got back some fastq files. The samples I sent off for sequencing appear to have some human DNA in them. I was wondering if there was a way to remove the human DNA from my samples. The fastq files are 150 bp paired-end reads. I saw there were some post suggestion bbsplit and other suggesting bbduk. I am just not sure which to use or maybe there is new/better software available. I saw that both those pieces of software are 6 years old (not that makes them bad).
Thanks in advance!
I guess
bbsplit
is still a valid option. Alternatively you could map the samples with any aligner against a combined genome consisting of both your target genome and the human genome and then remove those reads that map against human. I would probably require end-to-end mapping in this case to avoid soft-clipped matches.Huh.. that is an interesting idea? How would you remove the reads that map against the human genome?
I would do the following:
Append to the chromosome names of each fasta the species, e.g. chr1_human, chr2_human etc.... Same for your actual organism.
cat
both together and make an index, e.g. with bowtie2.Align reads. Probably end-to-end is good.
Keep everything non-human. You could use samtools view to only extract alignments to your organism and maybe the unmapped reads given you want to do any kind of assembly. I'd use a high MAPQ threshold here since you want to remove only the obvious human contaminations. Convert this back to fastq and done (I guess, never tried that). I think though that bbsplit does pretty much that under the hood.
I think I'll try both methods and see how it goes. Thank you for your suggestions
genomax knows bbtools very well, I would try that suggestion first. Mine is rather a naive thinking-aloud.