Question: Antisense transcription from strand specific RNA-seq
gravatar for sorrymouse
12 days ago by
sorrymouse70 wrote:

I have a number of libraries prepared with the Illumina Tru-seq strand specific library prep kit. I want to detect sense and antisense transcription of each gene.

I mapped the reads using hisat2 with the --rna-strandedness RF option.

The recommendations I can find for separating the bam files by strand, so that I can essentially have a sense .bam and an antisense .bam, are to use samtools -F 0x10 and -f 0x10, or similar flags using 16 and 32.

Whatever method I take, I get the same general output, but I can't shake this nagging feeling that I am not actually separating them based on sense and antisense transcription but rather some aspect of the library that is sense + something and antisense+ something.

I know that down the line I will need to check into the directionality of each gene, and that just separating by sense and antisense will include genes in the opposite orientation in each file.

But to begin with, how would one go about examining sense and antisense transcription by separating the .bam files?

I know this is not the first post on this, but many posts do not seem to elaborate whether they are interested only in separating the files by strand or sense and antisense transcription, and I don't think they are entirely the same thing.

rna-seq • 240 views
ADD COMMENTlink modified 10 days ago • written 12 days ago by sorrymouse70


According to the Hisat manual, the option just add the XS-tag to each alignment. To check the sence/antisense mappings, you might want to look at RSeQC infer experiment. There the read-orientation is analysed in regard of the transcripts' orientation.

ADD REPLYlink written 11 days ago by michael.ante2.7k

I agree that there is probably a simpler way to do this rather than splitting the files, but it was reassuring to make sure at each step that what should be happening was happening, by checking in IGV.

I will say that running stringtie separately on the negative and positive reads increases the number of predicted features by A LOT and I don't have any sense of the 'rightness' of this.

ADD REPLYlink written 10 days ago by sorrymouse70
gravatar for sorrymouse
11 days ago by
sorrymouse70 wrote:

What I ended up doing is somewhat redundant I think, but its as follows:

map the reads with hisat2

./hisat2 --phred33 --rna-strandness RF -x reference.fa -1 reads1.fq -2 reads2.fq -S out.sam

convert and sort with samtools

samtools view -Su out.sam | samtools sort - out.sorted

first read in pair maps to reverse strand, read is mapped in proper pair, read is paired

samtools view -f 83 -b out.sorted.bam > out.sorted.83.bam
samtools view -f 163 -b out.bam > out.sorted.163.bam

second read in pair maps to reverse strand, read is paired, read mapped to proper pair

samtools view -f 99 -b out.sorted.bam > out.sorted.99.bam
samtools view -f 147 -b out.bam > out.sorted.147.bam

merge files

samtools merge out.sorted.83.163.bam out.sorted.83.bam out.sorted.163.bam
samtools merge out.sorted.99.147.bam out.sorted.99.bam out.sorted.147.bam

stringtie to create a gtf reference - this is a non model organism

stringtie out.sorted.99.147.bam -o out.99.147.stringtie.gtf -p 3 --rf -m 50
stringtie out.sorted.83.163.bam -o out.83.163.stringtie.gtf -p 3 --rf -m 50
stringtie merge out.83.163.stringtie.gtf out.99.147.stringtie.gtf -o stringtie_merged.gtf

extract features that are on the + or - strand from stringtie

cat stringtie.merged.gtf | awk '$7 == "+" { print $0 }' > forward_cds.gff
cat stringtie.merged.gtf | awk '$7 == "-" { print $0 }' > reverse_cds.gff

separate out reads mapping to positive and negative features from the stranded read files to get sense and antisense transcription

bedtools intersect -b forward_cds.gff -abam out.sorted.83.163.bam > out.sorted.83.163.sense.bam
bedtools intersect -b reverse_cds.gff -abam out.sorted.83.163.bam > out.sorted.83.163.antisense.bam
bedtools intersect -b forward_cds.gff -abam out.sorted.99.147.bam > out.sorted.99.147.antisense.bam
bedtools intersect -b reverse_cds.gff -abam out.sorted.99.147.bam > out.sorted.99.147.sense.bam

merge the two sense and antisense files

samtools merge out_flags_sense.bam out.sorted.83.163.sense.bam out.sorted.99.147.sense.bam
samtools merge out_flags_antisense.bam out.sorted.83.163.antisense.bam out.99.147.antisense.bam

those bam files can be used in bedtools coverage, or I put them back into stringtie so that they can input into Deseq2

stringtie -eB -G stringtie_merged.gtf out_hisat_flags_sense.bam -o out_flags_sense.gtf
stringtie -eB -G stringtie_merged.gtf out_hisat_flags_antisense.bam -o out_flags_antisense.gtf

If I'm doing this wrong, I sure would love to know!! If anyone sees a flaw with this code, please point it out to me.

ADD COMMENTlink modified 11 days ago by h.mon21k • written 11 days ago by sorrymouse70
gravatar for h.mon
11 days ago by
h.mon21k wrote:

samtools flags for strand / reverse strand refers to the positive / negative strand of the reference genome / transcriptome used to build the index. So if you mapped to a reference genome, these flags says nothing about sense / antisense transcription. However, if you used a "correctly stranded" transcriptome as reference, then you can use sam flags to filter sense / antisense reads.

As michael.ante said, the hisat2 --rna-strandedness RF option adds XS attribute to each read, then if you want to filter sense / antisense mapped reads, you can use this XS attribute. You could also use bedtools intersect with -s or -S.

ADD COMMENTlink modified 11 days ago • written 11 days ago by h.mon21k
gravatar for Carlo Yague
10 days ago by
Carlo Yague4.3k
Carlo Yague4.3k wrote:

I want to detect sense and antisense transcription of each gene.

To do that, you don't actually have to split your bam into fwd and reverse and I think that what you are trying to do in your answer is overly complicated. An approach I used with success is to simply count the reads mapping in forward and reverse of the annotated genes with featurecounts on the original bam file. You simply have to change the "-s" option the either count the reads that map sense or anti-sense of your genes. Note that featurecounts can handle paire-end data smartly and that the count data can be directly inputed into DESeq2.

ADD COMMENTlink modified 10 days ago • written 10 days ago by Carlo Yague4.3k

This is a good point - but this is more of a final solution that one aspires to once they get a firm understanding of the processes that take place. I will make this into a recipe as well.

I will say that in general one should start out by splitting the files, visualizing and understanding what each data contains - so that they are confident about what tools do. It is way too easy to slip up and misspell -s or -S or even a sam flag, and ending up with neither sense nor antisense but just "nonsense" (pun intended :-) counts that later are more difficult to identify.

ADD REPLYlink written 10 days ago by Istvan Albert ♦♦ 78k
gravatar for Istvan Albert
11 days ago by
Istvan Albert ♦♦ 78k
University Park, USA
Istvan Albert ♦♦ 78k wrote:

I am creating a recipe called "Anti Sense Transcripts" as a teaching aid:

As of today (November 8, 2018) I am still exploring the best way to make the point but I think it is already useful.

Look at the code for the guidance of what each step does. Also the results will allow you to visualize the outputs that have already been created.

ADD COMMENTlink modified 11 days ago • written 11 days ago by Istvan Albert ♦♦ 78k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1241 users visited in the last hour