Archeological DNA sample - how to analyze
4 months ago
Aruna

Does it make sense to, and if so, how to analyze unmapped DNA reads from a newly sequenced archeological human bone sample? 80% of genomic reads did not align to the human reference genome sequence. When assembled using SPAdes, the contigs are about 180 Mb, after blasting them with NT database with 99% identity cutoff, evalue=1.e6, coverage 98%, I get 96 hits most of which bacteria, 1 hit to sugarcane, 1 hit to coconut, and 4 hits to rice, should I consider this significant or leave them as sequence contamination. Or should I go for subspecies alignment next?

The pipeline used so far:

fastqc, trimmomatic, bwa-mem-humanref-37, unmapped bam, fastqtofasta, assembly (SPAdes), contigs, blastn(galaxy)

unmapped paleogenomics archeogenomics Assembly • 837 views
If you have 80% of reads showing up as non-human in a human sample then you need to seriously consider dropping the sample. SPAdes is also not an appropriate assembler for human genome.

Edit: After writing this comment @Aruna clarified that this is a paleo-DNA sample so obviously a vital piece of information was missing from the original post.

Please denote which is a better assmebler for human genome

What kind of human sample? What you describe could for example make sense for a stool or intestinal mucus sample, as opposed to a blood sample. You might either discard the sample as contaminated or treat it with a metagenomics approach. The methods to analyze the sample depend on its origin and the scientific question behind the experiment. This question cannot be answered without the necessary background information.

The human sample is from bone DNA. If i have to treat it with metegenomics approach what steps i should take, pardon me im a beginner. I would like to publish about interesting information that may or may not present in the unmapped reads in my sample. Also, why do i get plant based sequences with almost 98% identity in one of the contigs..

I don't think meta-genomics is appropriate here. It seems you are interested in human DNA. This case shows the importance of meta-data and annotation. Where is the sample coming from exactly? Has it been recently extracted from a patient, or is it an archeological sample, did you download the data from a public archive or are they new? What is your read-mapping pipeline and did you do quality checks and trimming of the reads? Sorry for those many questions, but they are all needed. Unfortunately still, the most likely outcome is that you should discard your sample.

Yes an archeological sample. New one not downloaded from anywhere. FASTQC summary

PASS    Basic Statistics                    Madurai_1.fastq.gz
PASS    Per base sequence quality   Madurai_1.fastq.gz
FAIL    Per tile sequence quality           Madurai_1.fastq.gz
PASS    Per sequence quality scores Madurai_1.fastq.gz
WARN    Per base sequence content   Madurai_1.fastq.gz
PASS    Per sequence GC content Madurai_1.fastq.gz
PASS    Per base N content                  Madurai_1.fastq.gz
WARN    Sequence Length Distribution    Madurai_1.fastq.gz
PASS    Sequence Duplication Levels Madurai_1.fastq.gz
PASS    Overrepresented sequences   Madurai_1.fastq.gz


fastqc, trimmomatic,bwa-mem-humanref-37, unmapped bam, fastqtofasta, assmebly(SPAdes), contigs,blastn(galaxy)

hard to think the sample needs to be discarded, any explanations for this kind of irrelavant mapping results.

So, this is archeo- or paleo-genomics, I am glad we sorted that out. This also means your sample is good and does not need to be discarded.

The reasons for low mapping rate are:

• the bones were buried (likely) in soil for ages, hence the bacteria and plant DNA contaminants
• DNA degrades over time, only a few fragments remain to be sequenced, hence low sequence quality.

You are somewhat lucky that you have 20% mappable DNA left. You should contact an expert in the field.

Rice, coconut, and sugarcane have all been cultivated in India for millennia, therefore it is not surprising to find traces. This finding could be relevant, but only if you could prove that these contaminants are contemporary with the bone sample. Not sure how this could be evaluated, though.

What you do next depends on what you want to know about the specimen's genotype.

I have edited the Title and Tags to reflect this and bumped the question.

Thank you Sir. You made my day. thats so much updations you have given. great learning from you.

can i go for publication with these blast results?

Possibly, it depends on whether you can formulate an interesting research question and build a story around it, but you need to show that you follow the state-of-the-art in your analyses which I don't see quite yet. I am not an expert in this field so I recommend getting local support from someone knowledgeable in the field and in how to write papers about the topic, and also read a lot of papers on the topic. Otherwise trying to publish as a novice could become very frustrating quickly. Think about a good title to summarize your findings in single sentence. Once you have at least a draft manuscript, one can evaluate publication options much easier. Start writing Methods and Results sections, if there is enough substance to it, then this should be quite straightforward.

I get your point sir. will definitely work on the same.

@Aruna you had left out the most vital piece of information in the original post. Since very few labs work on paleo-genomics you must already be familiar with them. As @Michael pointed out you will need to use a specialized data pipelines if you are interested in analyzing either the hominim/human/other nucleic acids. Hopefully you took all the necessary precautions when preparing the sample from the bone to avoid contamination from extant nucleic acids.

Yes sir we did took great care during sample prep. i will explore further in this direction. means a lot.