Question: Multiply aligning reads in RNA-seq hitting against targeted KO-first, conditional ready, lacZ-tagged
0
gravatar for apt.university
2.2 years ago by
United States
apt.university70 wrote:

I am analyzing some RNA-seq data generated from a treatment/control experiment on the NG108-15 neuroblastoma-glioma hybrid cell line. The sequencing was done using Illumina SE 100 bp reads.

When aligning the reads back against the reference -- I used both HISAT2 and Tophat against mm10 and grcm38 -- approximately 20-30% of the reads align to multiple loci.

Upon further inspection, it looks like a large fraction of the multiply aligning reads hit against repeats of a "targeted KO-first, conditional ready, lacZ-tagged mutant allele." I've also subsampled 1M reads from my samples and aligned them using Blast against accession JN958699.1 . Counting only perfect Blast matches shows that about 12% of the 1M reads samples align perfectly on JN958699.1

Both the mm10 and grcm38 references seems to contain hundreds of paralogs of that LacZ-tagged mutant allele.

Any one knows what the repeat "targeted KO-first, conditional ready, lacZ-tagged mutant allele." is involved in and what would lead to its enrichment in an RNA-seq experiment? Note that the multiply aligning reads are as abundant in the control as in the treatment.

Thank you so much for any suggestions of hints you might be able to offer.

rna-seq • 730 views
ADD COMMENTlink written 2.2 years ago by apt.university70

The question becomes where in that construct they align. My guess would be the poly-adenylation site, which would be understandable for RNAseq data.

Always actually look at your data in something like IGV.

ADD REPLYlink written 2.2 years ago by Devon Ryan90k

Thanks for your Answer, Devon. The 1M test dataset I ran is distributed across the complete 35 Kbp construct sequence. Why would it be understandable in RNA-seq for such a large portion of the data to hit poly-adenylation site of this particular repeat family? Could you please elaborate?

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by apt.university70

Typical RNAseq library preparation includes polyA enrichment, so one generally expects to see a bunch of random polyA signal due to that.

ADD REPLYlink written 2.2 years ago by Devon Ryan90k

Yes, of course... in which case, it shouldn't be specific to this particular repeat family. Thanks again, Devon.

ADD REPLYlink written 2.2 years ago by apt.university70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1799 users visited in the last hour