Determining strandedness of RNAseq samples from count data and understanding counts in sex genes
0
0
Entering edit mode
7 weeks ago
sr41489 • 0

I know how to check for strandedness on IGV, however, I am trying to determine this for hundreds of samples prepared by ~5 different groups (using IGV command line has been extremely difficult for me on my work computer). I have a suspicion that some of these are unstranded while most are stranded (I did a spot check using 5 CRAM files on the IGV GUI: 4 were stranded, 1 was not). With that, I generated htseq counts for all the samples, getting both unstranded and stranded counts for each.

I thought it would make sense to use sex genes to see if samples were prepared with a stranded or unstranded protocol.That being said, I organized the samples by gender and have isolated counts for the following sex genes: XIST, RPS4Y1, RPS4Y2, DDX3Y, and USP9Y. I'm seeing that those 4 that were stranded (3 females, 1 male) have counts in the XIST gene, which makes sense, but the females also have a few counts in some of the Y-chromosome genes. I'm seeing more nonsensical data when I look at the unstranded counts for these 4, so I feel like at least I'm on the right track with the library construction protocol. I'm not sure, however, why there are any counts at all in the Y chromosome genes for these 3 females.

Anyway, it's a long-winded, two part question, but I wanted to ask if anyone had a good way to determine strandedness on a large set of data using raw counts, and how I can make sense of the counts I'm seeing for some of the females - is this perhaps contamination? Thanks for the help!

strandedness htseq RNAseq • 214 views
ADD COMMENT
1
Entering edit mode

If you want to do that efficiently and systematically the I would take these CRAMs, convert some reads (maybe 1million or so) back to fastq and then quantify them against a transcriptome with salmon as salmon has an automated mode to determine strandedness, check its documentation and its -l A argument as well asprevious posts on strandedness inference. Just using raw counts appears cumbersome.

ADD REPLY

Login before adding your answer.

Traffic: 2490 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6