Question

low mappability in single cell rna-seq data?

0

Entering edit mode

6.2 years ago

ccagg ▴ 60

Hi,

Using the STAR aligner, I am getting a very low mapping percentage for my single cell RNA seq data (5-10%). A majority of my reads are being considered "too short" (>90%). My current parameters are STAR --genomeDir --outFilterScoreMinOverLread 0.3 --outFilterMatchNminOverLread 0.3 --outReadsUnmapped Fastx --outSAMstrandField intronMotif --readFilesCommand zcat --readFilesIn *.fq.gz --runThreadN 6

I am also trimming the reads with trim galore as follows: trim_galore $R2_file --trim-n -a AAAAAAAA -clip_R1 9 -o $dir_name

Is there any hypothesis for why we are getting such low percentage of mapped reads? I am particularly interested in assessing contamination. Is there a good software for just quickly assessing whether my samples could be contaminated? I have no good idea with what they could be contaminated with.

Thanks!

RNA-Seq contamination alignment • 2.4k views

ADD COMMENT • link 6.2 years ago by ccagg ▴ 60

0

Entering edit mode

One should never trim reads independently (if you have paired end data). You are also not scanning/removing Illumina adapters.

ADD REPLY • link 6.2 years ago by GenoMax 141k

0

Entering edit mode

My presumption is that this is something like CEL-Seq2 data and OP is trying to remove polyadenylation from read 2 (if it's still there then it'll get soft-clipped, so I think that's excess effort). If that's the case, read 1 is mostly polyA plus UMI/cell barcode, which I imagine is causing mapping issues.

ADD REPLY • link 6.2 years ago by Devon Ryan 104k