How to extract sense and antisense read counts from mapping file (.bam) for RNA-seq data set
1
0
Entering edit mode
6.7 years ago
M K ▴ 540

Hi All,

I mapped my specific strand RNA-seq data to human genome hg19, and I want to extract sense and antisense read counts from mapping file (.bam) . Can any one help me to do that.

RNA-Seq next-gen R • 5.0k views
2
Entering edit mode
6.7 years ago

Use either featureCounts or htseq-count. Running them twice on the same dataset with the strand setting reversed (e.g., -s yes and then -s reverse) will produce what you want.

0
Entering edit mode

so how can I compare the results with the annotation file GTF

0
Entering edit mode

Huh? Why would you need to and what would that even mean?

0
Entering edit mode

I want to detect antisense using  R package called NASTI-seq and they using modified GTF file to compare the results with the mapped file .bam to count reads

1
Entering edit mode

Always mention in the beginning when you intend to use a particular package, especially when it's an obscure one.

If you look at the NASTIseq documentation, you'll see that the input is a list with the following 4 items:

1. smat: A matrix of sense counts (these counts can be produced via my answer above).
2. asmat: A matrix of antisense counts (these counts can be produced via my answer above).
3. genepos: Your GTF file, presumably as a GRanges object or maybe a data.frame, but they don't specify this. It appears that they just need gene bounds (this is actually poorly thought out, but doable with GenomicFeatures).
4. pospairs: A 2-column data.frame with known sense-antisense pairs. The values should match row.names from genepos.
0
Entering edit mode

Thanks Ryan for helping me, and I am working on featureCounts to get sense and antisense matrix, but I didn't understand how they get genepos file so do you have any idea about that.

0
Entering edit mode

The documentation for that package is absolutely terrible (please feel encouraged to contact the author and complain...when a paper over an R/other software package is in review, we really need to include the documentation/tutorial in the review process), so I'm surprised you're stumped. It turns out that you have to look through the source code to see what's actually used from genepos and what things need to be named as.

Assuming you have a reasonably well formatted GTF for your organism (i.e., not from UCSC), then something like the following R code should work to create genepos.

library(rtracklayer)
gtf <- import.gff("Mus_musculus.GRCm38.71.gtf", format="gtf")

stupidNASTIseqFunction <- function(gtf) {
#compact each gene to a single entry
grl <- split(gtf, gtf$gene_id) #This won't work on the weird GTF files from UCSC res <- lapply(grl, function(x) { start(x)[1] = min(start(x)) end(x)[1] = max(end(x)) x[1,] }) #format the output data frame df <- data.frame( seqname = factor(sapply(res, function(x) as.character(seqnames(x)))), start = sapply(res, start), end = sapply(res, end), strand = factor(sapply(res, function(x) as.character(strand(x)))), attributes=sapply(res, function(x) x$gene_id))
#reorder things, since grl is ordered be gene_id
o <- match(unique(gtf$gene_id), df$attribute)
df[o,]
}

genepos <- stupidNASTIseqFunction(gtf)



You'll need to change the import.gff(...) part to match your own GTF file. For what it's worth, this method is rather slow with large GTF files, but you'll only need to do it once.

0
Entering edit mode

Dear Devon, I am trying to extract sense and antisense read percentage for globin genes. I used htseq-count to obtain stats. I have posted a question on comparing strand setting as yes, no and reverse How to interpret the difference among these three options in strandedness from HTSeq-count. I need some help on interpreting the results can you help.