Question: Counting sense reads in bacterial paired-end RNA-seq data
1
gravatar for biotech
4.7 years ago by
biotech510
United States
biotech510 wrote:

Hi,

I'm trying to count reads mapping to sense strand. I have doubts which counts file I should chose from this pipeline. I think is "plate_R.counts" because has more reads counted in total. Am I right?

Library creation kit -> E7420S NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina®

I would also appreciate a nice tutorial to understand Illumina paired-end library preparation, alignment, counting...

Thanks!

P.S I read previous post asking similar questions but still I have doubts!

How To Separate Illumina Based Strand Specific Rna-Seq Alignments By Strand

#################################################
#BWA HT.seq bacterial paired-end RNA-seq pipeline
#################################################

# Get the genome file from the command line
genome_file=$1
# Get the fastq file from the command line
fastq_file_R1=$2
# Get the fastq file from the command line
fastq_file_R2=$3
#get gff
GFF=$6

#BWA index (default settings)
bwa index $genome_file
#BWA align
bwa mem -t 8 $genome_file $fastq_file_R1 $fastq_file_R2 | gzip -3 > P_S1_L001_aln-pe.sam.gz

#Flagstat
#Convert .sam to .bam to input to Flagstat
samtools view -b -S -o P_S1_L001_aln-pe.bam P_S1_L001_aln-pe.sam.gz
samtools flagstat P_S1_L001_aln-pe.bam

#Count reads mapped with htseq-count
samtools sort -n P_S1_L001_aln-pe.bam plate.sorted
python -m HTSeq.scripts.count -m intersection-nonempty -f bam -a 10 -t mRNA -i Parent -r name -s yes plate.sorted.bam $GFF | awk 'n>=5 { print a[n%5] } { a[n++%5]=$0 }' > plate_F.counts
python -m HTSeq.scripts.count -m intersection-nonempty -f bam -a 10 -t mRNA -i Parent -r name -s reverse plate.sorted.bam $GFF | awk 'n>=5 { print a[n%5] } { a[n++%5]=$0 }' > plate_R.counts

 

rna-seq bacteria paired-end htseq • 3.3k views
ADD COMMENTlink modified 3.1 years ago by Biostar ♦♦ 20 • written 4.7 years ago by biotech510

how about this:

samtools view -f 16 your.bam | wc -l
ADD REPLYlink written 4.7 years ago by Phil S.660

What I was trying to say is that I need the number of reads mapping to sense and the number of reads mapping antisense to the annotated genes, not the the original sequence.

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by biotech510

Try bedtools intersect (full documentation of options at https://bedtools.readthedocs.org/en/latest/content/tools/intersect.html)

Your command might look something like this (not tested):

bedtools intersect -S -c -a genes.bed -b mapped_reads.bam

That will output the count of reads overlapping on the anti-sense strand with each gene. To get the total across all genes, sum the counts.

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by David Fredman980
2
gravatar for David Fredman
4.7 years ago by
David Fredman980
University of Bergen, Norway
David Fredman980 wrote:

Phil's suggestion is on the right track. You can use samtools view filters to select reads mapping to the sense or anti-sense strand of your reference sequence. However, the -f flag extracts reads mapping anti-sense. To get the mapping locations of reads mapping to the sense strand, use the -F (filter) option

samtools view -F 16 mapped_reads.bam 

To count unique reads (not mapping locations) if you have allowed multiple mapping locations, you may have to make the list of read identifiers non-redundant first:

samtools view -F 16 mapped_reads.bam | cut -f1 | sort | uniq  | wc -l 

See this gist for those (and other) examples

ADD COMMENTlink written 4.7 years ago by David Fredman980

How can we tell the read counts on plus strand (-F 16)?

what if we set the "--library-type fr-firststrand" or "--library-type fr-secondstrand"? what is the difference?

ADD REPLYlink written 3.9 years ago by wm110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 782 users visited in the last hour