If I understand this correctly, you have contaminant sequences mapping to zea mays, from a human plasma sample? And after various filtering, steps, they still persist?
What proportion of reads map to maize? I ask as there are usually always unmappable or spurious reads in any NGS experiment. You'd never expect to see 100% of reads on-target. These might be random amplification artefacts, or leftover molecules from previous experiments in the sequencer, or even remnants of a lab technician's lunch.
It could also be that the reads mapping to maize are actually human in origin and you've got a short motif mapping spuriously to over-represented model organisms. Humans and plants do, believe it or not, share a number of orthologous sequences.
You're right, that's exactly what I meant.
Although I used multiple methods, sequences from zea mays were still found in human plasma samples, possibly contaminating sequences. As shown in the table below, not just zea mays sequences but other sequences.
reference: 1 https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0069805 [2] https://www.nature.com/articles/s41564-023-01350-w [3] https://www.nature.com/articles/cr2011158
I don't know how to interpret the authenticity of this result based solely on the sequencing data.
So I would like to ask everyone if there are more rigorous and reliable steps for bioinformatics analysis methods. Or possible explanations?
If you've only got 30-100 reads out of 1.4 billion, then this is nothing to be concerned about. Likely explanations are that they're either spurious sequencing artefacts or leftover molecules from previous experiments on the sequencer.
What you will notice though is that all those reads are from "model organisms" which are hugely over-represented in most genomic databases. The thing about alignment algorithms is that most return the "best" match, even if the alignment is crappy and there is nothing else remotely similar. I would guess that if you were to show us an example BLAST alignment of one of the maize reads, it would be very unconvincing.
Thanks
I know what you mean, that's what I'm thinking about. However, I had to conclude this project.
I can't explain the origin of these sequences, although it is certainly in low concentration. If it were you, how would you consider this result? Should you continue to improve the method of bioinformatics analysis, or turn to experimental verification?
Or give up, there is no possibility of plant origin DNA in the blood.
Could you perform a blast analysis of one or two plant sequences and paste the result here?
I would not consider the (very) low frequencies of plant reads to be significant. If you have the time / funds, I would consider a different approach using species-specific plant primers and basic Sanger sequencing. If you know exactly what's in the diet of the cohort then you can easily pick a few sets of specific primers.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Please simplify your question. It is very difficult to understand.
ok,
i have samples of human plasma sequence.
Although I removed the human reads in many ways. I still obvious the different species sequences in this data.
So, the possibility of this sequence origin?
Why did you delete this post, biwdpang?
Hi, Ram
I don't fully understand the issue, so there's no obvious reference value.
It does have value and can be added to in the future, so please don't delete it.
Thank you for your recognition.
I will always keep it.