very low coverage when mappin genomic DNA
0
0
Entering edit mode
17 months ago
Lila M ★ 1.2k

Hi all, I'm having terrible problems to map/align single end RNA files from human genome (GRCh.38). It is genomic DNA but was prepared by using a RNA library kit to preserve strand specificity. I've first tried STAR and Kallisto and the coverage was very very low. Then, I've tried bowtie2 as the experiment is more like ChIP seq and the coverage is still very poor (see output for samtools flagstat)

41115150 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
102795 + 0 mapped (0.25% : N/A)


mapping coverage DNA genomic • 1.4k views
2
Entering edit mode

Have you confirmed that the data you have is actually from human genome by taking a few reads and blasting them at NCBI? It would not be the first time that someone received a dataset that was not actually what they thought it was.

0
Entering edit mode

there are no similarities with human genome when I run blastn... so I guess it means the sequences in the samples are full artifacts, isn't it?

0
Entering edit mode

please note if you run it against RNA database + try to trim them manually (take only the middle of the read) and run blastn

0
Entering edit mode

Have you tried blasting not only to humans but to a general database, to see against what species it might match?

0
Entering edit mode

yes, and nothing comes out

2
Entering edit mode

So it's more evidence that the library prep failed and you sequenced artifacts.

0
Entering edit mode

I also think so now.

I did not know that it is possible (the worst Ive seen was around 20% mapping rate) but it is possible.

1
Entering edit mode

It is genomic DNA but was prepared by using a RNA library kit to preserve strand specificity.

Can you elaborate the theory behing that? genomicDNA is double-stranded and as such does not really have the concept of strand specificity as cDNA that was synthesized from mRNA has. I guess this unconventional library prep is the reason this mapping is so poor, aka it did not work and you sequenced library prep artifacts.

0
Entering edit mode

I did not prepare the samples and for now this is all the information that I have. It is a novel experiment so the person that did it doesn't really know it will work o not. And I agree about it could be a bit messy, but as this is what I have, I would like to know if someone may have a clue about how to deal with this kind of data/experiments.

1
Entering edit mode

To be frank, 0.25% alignment is not "a bit messy", it is more an indicator of a failed experiment. I think this is not a bioinformatics problem, seems your team is trying to develop a new experimental technique, but this here is the wrong community for it. The wetlab scientist should try and discuss this (in case you want to do it online) in a Reddit group for molecular biology and NGS. I guess this is where they get most audience these days. If you want to know what these reads that you have are then maybe try blasting a good subset of them to the NCBI nucleotide collection. Still, in case of failed library preps it is not unusual to simply have cryptic reads that are some odd ligation or PCR artifacts with no matches at all.

0
Entering edit mode

0
Entering edit mode

Looking in the raw data file and blasting sequences should help to see if the reads are library prep artifacts or good reads

0
Entering edit mode

maybe you forgot to trim adapters?

I don't know what happened but IMO even if you generate random reads some of them will be mapped to such a huge genome as human's.

0
Entering edit mode

Good point. Run fastqc in case you haven't, and maybe bowtie2 with --very-fast-local which will allows lenient local alignments with soft-clipping non-matched parts, and then see whether it looks strikingly different.

0
Entering edit mode

yes, I've trimmed them with bbduck