Question

How to identify long reads coming from ligated cDNAs before sequencing ?

0

Entering edit mode

5.7 years ago

Florian Bernard ▴ 90

We have generated sequencing data with the direct-cDNA sequencing kit from ONT. During the library prep, prior to adding the adapters on each cDNA, we performed a ligation step to concatenate cDNAs together.

The aim was to see if we could increase the total number of cDNAs sequenced whithout actually increasing the number of molecules sequenced (increase in total number of bases sequenced and in mean read length).

However, I'm having trouble to actually identify sequences that would come from such reads and I'm not sure of what would be the most efficient way to find such reads.

I tried to map my reads onto the genome and then look at the alignments. Correct me if I'm wrong, but I assumed I would see an increase in the number of chimeric reads versus the primary reads ? However I see no such thing and - even worse - I came to realize I always have 13-15% of chimeric reads in dataset generated by direct-cDNA Seq. Could there be that some of those chimeric reads are not relevant ? If yes, how one would filter those while still retaining reads that are actually coming from 'real' cDNAs ? Nanopore reads being quite noisy I believe it makes the analyze even more complicated.

Otherwise, would there be a way to split my reads before alignment or identify reads that result from concatenated cDNAs ?

I thank you for your cooperation.

minimap2 ONT nanopore alignment cDNA • 1.3k views

ADD COMMENT • link 5.7 years ago by Florian Bernard ▴ 90

0

Entering edit mode

Correct me if I'm wrong, but I assumed I would see an increase in the number of chimeric reads versus the primary reads ?

Just thinking aloud here. Are you sure the ligation strategy actually worked? Or if it did perhaps it generated concatamers that were very long and were not actually sequenced?

ADD REPLY • link 5.7 years ago by GenoMax 146k

0

Entering edit mode

Yes you are right, I'm actually trying to see if it worked or not.. (hence: to see if there is a point to do that supplementary step or not). But being novice at bioinformatics I'm not sure if what I'm observing comes from an erroneous way of analyzing my dataset or if there is indeed no real difference with the other datasets that were generated by following the classical protocol (= no extra ligation step).

Also: I didn't do the library prep but if I recall correctly we saw an increase in fragments length (by running some of it on an agarose gel). But we didn't quantify the efficiency of the ligation and it might be concern a smaller fraction of the reads than what we expected. For some reasons, shorter reads might also be easier to sequence, resulting in those bigger reads being found less often.

However, I'm still surprised to find that many chimeric reads in my other datasets. So either reads reported are chimeric are not always relevant (and then not an accurate marker for observing concatenation of cDNAs) or it means ligations between cDNAs already happen during a normal library prep.

ADD REPLY • link 5.7 years ago by Florian Bernard ▴ 90

0

Entering edit mode

Just to clarify: by chimeric reads you mean that one part of the read is mapping to gene A (an exon there in) and other to gene B? And these were present in the datasets even when no ligation was tried?

ADD REPLY • link 5.7 years ago by GenoMax 146k

0

Entering edit mode

Yes, based on this definition of chimeric reads here A: definition of chimeric vs multiple-mapping (SAM)

Correct, a chimeric (or "non-linear") alignment occurs when non-overlapping portions of it map to (A) different portions of the same chromosome in a manner not normally biologically supported or (B) to different chromosomes. Multimapping occurs commonly in repeat regions or where there are paralogs. Chimeric alignments occur when there are structural rearrangements, such as with cancer (or something weird happened during sample preparation).

As it seems that only chimeric reads are flagged as supplementary alignments, I used samtools view -c -f 0x800 to count the number of chimeric reads I have in each dataset. And across all 4 direct-cDNA datasets (1 with the ligation step / 3 without), the percentage of chimeric reads stays the same.

ADD REPLY • link 5.7 years ago by Florian Bernard ▴ 90