Question

Counting sense reads in bacterial paired-end RNA-seq data

1

Entering edit mode

9.9 years ago

biotech ▴ 570

Hi,

I'm trying to count reads mapping to sense strand. I have doubts which counts file I should chose from this pipeline. I think is "plate_R.counts" because has more reads counted in total. Am I right?

Library creation kit -> E7420S NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina®

I would also appreciate a nice tutorial to understand Illumina paired-end library preparation, alignment, counting...

Thanks!

P.S I read a previous post asking similar questions but still I have doubts!

#################################################
#BWA HT.seq bacterial paired-end RNA-seq pipeline
#################################################

# Get the genome file from the command line
genome_file=$1
# Get the fastq file from the command line
fastq_file_R1=$2
# Get the fastq file from the command line
fastq_file_R2=$3
#get gff
GFF=$6

#BWA index (default settings)
bwa index $genome_file
#BWA align
bwa mem -t 8 $genome_file $fastq_file_R1 $fastq_file_R2 | gzip -3 > P_S1_L001_aln-pe.sam.gz

#Flagstat
#Convert .sam to .bam to input to Flagstat
samtools view -b -S -o P_S1_L001_aln-pe.bam P_S1_L001_aln-pe.sam.gz
samtools flagstat P_S1_L001_aln-pe.bam

#Count reads mapped with htseq-count
samtools sort -n P_S1_L001_aln-pe.bam plate.sorted
python -m HTSeq.scripts.count -m intersection-nonempty -f bam -a 10 -t mRNA -i Parent -r name -s yes plate.sorted.bam $GFF | awk 'n>=5 { print a[n%5] } { a[n++%5]=$0 }' > plate_F.counts
python -m HTSeq.scripts.count -m intersection-nonempty -f bam -a 10 -t mRNA -i Parent -r name -s reverse plate.sorted.bam $GFF | awk 'n>=5 { print a[n%5] } { a[n++%5]=$0 }' > plate_R.counts

RNA-seq HTSeq bacteria paired-end • 5.3k views

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 9.9 years ago by biotech ▴ 570

0

Entering edit mode

how about this:

samtools view -f 16 your.bam | wc -l

ADD REPLY • link 9.8 years ago by Phil S. ▴ 700

0

Entering edit mode

What I was trying to say is that I need the number of reads mapping to sense and the number of reads mapping antisense to the annotated genes, not the the original sequence.

ADD REPLY • link 9.3 years ago by biotech ▴ 570

0

Entering edit mode

Try bedtools intersect (full documentation of options at https://bedtools.readthedocs.org/en/latest/content/tools/intersect.html)

Your command might look something like this (not tested):

bedtools intersect -S -c -a genes.bed -b mapped_reads.bam

That will output the count of reads overlapping on the anti-sense strand with each gene. To get the total across all genes, sum the counts.

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by David Fredman ★ 1.1k

Ram · Answer 1 · 2014-07-28

3

Entering edit mode

9.8 years ago

David Fredman ★ 1.1k

Phil's suggestion is on the right track. You can use samtools view filters to select reads mapping to the sense or anti-sense strand of your reference sequence. However, the -f flag extracts reads mapping anti-sense. To get the mapping locations of reads mapping to the sense strand, use the -F (filter) option

samtools view -F 16 mapped_reads.bam

To count unique reads (not mapping locations) if you have allowed multiple mapping locations, you may have to make the list of read identifiers non-redundant first:

samtools view -F 16 mapped_reads.bam | cut -f1 | sort | uniq | wc -l

See this gist for those (and other) examples

ADD COMMENT • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by David Fredman ★ 1.1k

0

Entering edit mode

How can we tell the read counts on plus strand (-F 16)?

what if we set the "--library-type fr-firststrand" or "--library-type fr-secondstrand"? what is the difference?

ADD REPLY • link 9.0 years ago by wm ▴ 560