Zero small RNA counts on both featureCounts and HTseq
0
0
Entering edit mode
3.3 years ago

Hey folks, I'm new with RNAseq analysis and would appreciate your assistance in figuring this out.

I have single end reads from RNAseq data with the goal of profiling all available sncRNAs in the tissue (n = 96 samples). Post-trimming Fastqc distribution ranged between 16 - 50nt. Here are my steps:

  1. Mapped against all available human sncRNAs (which includes miRNA,piRNA,tRNAs,rRNAs, snoRNA, snRNAs and mitRNAs) downloaded from RNAcentral fasta file with Bowtie1 (bowtie -p 10 -n 2 -k 1 --best -x index -q trimmed.fq.gz -S | samtools view -ShbF4 | samtools sort -o mapped.sorted.bam), intended to randomly assign each of the multi-mapped reads to the best quality region (result:87% with at least 1 unique with only ~5% unique reads). I'm not exactly sure if this (-k 1 --best) is the appropriate way for random assignment of multi-mapped reads but please correct me on this.
  2. Indexed (Samtools index mapped.sorted.bam).
  3. I also downloaded .gff3 file from RNAcentral .gff3 file and then converted to .gtf file using gffread. Please see below for few lines of my gtf and sorted bam files. .gtf. mapped.sorted.bam
  4. I tried both the HTseq ( $htseq-count -f bam -m union -s no mapped.sorted.bam -t exon --idattr=transcript_id homo_sapiens.GRCh38.gtf > counts.txt) and featureCounts ($featureCounts -T 10 -s 0 -t exon -g transcript_id -a homo_sapiens.GRCh38.gtf -o counts.txt mapped.sorted.bam) using the following commands but weirdly got zero 0%, alignment in both featureCounts result snapshot, suggesting a potential issue with my previous steps I can't seem to figure out.

Any help will be appreciated!

rna-seq alignment sequencing • 1.2k views
ADD COMMENT
0
Entering edit mode

You cannot link images from your local PC, you need to upload them to an image hoster and then paste the full link incl. the suffix (e.g. png) into the field that pops up when using the image button.

As for the question, is this standard RNA-seq or smallRNA-seq?

ADD REPLY
0
Entering edit mode

The is smallRNA-seq data.

My apologies about the images. here are the links to the three images now using googledrive image hoster.

.gtf mapped.sorted.bam featureCounts snapshot

ADD REPLY
0
Entering edit mode

Please use the directions here (rather than google drive links): How to add images to a Biostars post

ADD REPLY
0
Entering edit mode

Thanks GenoMax, I used imgbb as recommended in the link. Please see the links below:

.gtf mapped.sorted.bam featureCounts output

ADD REPLY
0
Entering edit mode

For some reason, the image icon above appeared not to work. Here are the links again:

gtf: https://ibb.co/19sv9WV

mapped.sorted.bam: https://ibb.co/0cxmB8w

featureCounts output: https://ibb.co/hWmxq8s

ADD REPLY
0
Entering edit mode

You have a mismatch with reference identifiers. These need to match in all locations. In your GTF they are 1 (which I assume is chromosome 1). In your BAM they seem to be just URS* (hard to tell).

Looks like you aligned to transcripts but are trying to use a GTF file from the genome.

ADD REPLY
0
Entering edit mode

@GenoMax, thanks for a prompt response. Still confused, please are you referring to a mismatch in my gtf file (columns 3 & 9) or my .bam file. Please how do I make them match in all locations? The URS* in column 3 of the bam file seems to be the RNAcentral ID and I wonder if that needs to change also. Please what must I do to fix all of these issues? Thanks

ADD REPLY
3
Entering edit mode

As you see in the alignment/annotation snippets below the chromosome (reference name) needs to match for featureCounts or htseq-count to be able to do its work.

NS501234:19:H2HLYAFXX:2:21101:9972:15618        99      **chr1**    12618   3       69=1X6= =       12743   201     TGGTGATGCCAGGCATGGCTCGATCGCGCGATCGCGATCGCGCGAGCAGAAGACGACGGCCGACTTGGCTCACAC   AAAAA.FFFA)FFFFFFFFFFFFFFFFFFAAFFFFFFFFFFF<FFAFFFFFFFAFFFFFFFFFFFFFAF.FF.FFF  XT:A:R   NM:i:1  AM:i:3
NS501234:19:H2HLYAFXX:2:21101:9972:15618        147     **chr1**    12743   3       76=     =       12618   -201    AGTGGGAGTGGCGTCGCCCCTAGGACGCTAGCTAGCTAGCTCGATCGTGTCTCCTGGAGAGGCTTCGATGCCCCTCC   <FFF.FFF.FFFF)FFAFFFFFAFFF)FFFFFFFF<FFFFF)FFFFFF.FFFFFFFFFFFFF.AFFFFFFFAAAAA  XT:A:R   NM:i:0  AM:i:3

**chr1**    BestRefSeq      exon    11874   12227   .       +       .       gene_id "DDX11L1"; gene_name "DDX11L1"; transcript_id "rna0"; tss_id "TSS31672";
**chr1**    BestRefSeq      exon    12613   12721   .       +       .       gene_id "DDX11L1"; gene_name "DDX11L1"; transcript_id "rna0"; tss_id "TSS31672";

You should realign to the genome and then use the genome based GTF file for counting. It would be messy to try and rename your GTF entries.

You may simply be able to run samtools idxstats on your BAM and use the counts since you are aligning to small RNA's.

ADD REPLY
0
Entering edit mode

@GenoMax, thank you so much for your help. I did not know it was possible to count small RNA sorted bam files directly and easily with samtools idxstats without either HTseq or featurecounts. That was really helpful.

Also, this may sound naive but I wonder what may be the pros and cons of using samtools idxstats as opposed to the "common" HTSeq and featureCounts. Is this applicable to other sequencing data (mRNA-seq or chip-seq).

Presumably my last question: Please what do you think about Bowtie1 random mapping of multi-mapped reads from my initial post: Bowtie1 (bowtie -p 10 -n 2 -k 1 --best -x index -q trimmed.fq.gz -S | samtools view -ShbF4 | samtools sort -o mapped.sorted.bam), intended to randomly assign each of the multi-mapped reads to the best quality region (result:87% with at least 1 unique with only ~5% unique reads). I'm not exactly sure if this (-k 1 --best) is the appropriate way for random assignment of multi-mapped reads but please correct me on this.

Thanks once again!

ADD REPLY

Login before adding your answer.

Traffic: 2184 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6