Entering edit mode
3.3 years ago
ooluwayiose
•
0
Hey folks, I'm new with RNAseq analysis and would appreciate your assistance in figuring this out.
I have single end reads from RNAseq data with the goal of profiling all available sncRNAs in the tissue (n = 96 samples). Post-trimming Fastqc distribution ranged between 16 - 50nt. Here are my steps:
- Mapped against all available human sncRNAs (which includes miRNA,piRNA,tRNAs,rRNAs, snoRNA, snRNAs and mitRNAs) downloaded from RNAcentral fasta file with Bowtie1 (bowtie -p 10 -n 2 -k 1 --best -x index -q trimmed.fq.gz -S | samtools view -ShbF4 | samtools sort -o mapped.sorted.bam), intended to randomly assign each of the multi-mapped reads to the best quality region (result:87% with at least 1 unique with only ~5% unique reads). I'm not exactly sure if this (-k 1 --best) is the appropriate way for random assignment of multi-mapped reads but please correct me on this.
- Indexed (Samtools index mapped.sorted.bam).
- I also downloaded .gff3 file from RNAcentral .gff3 file and then converted to .gtf file using gffread. Please see below for few lines of my gtf and sorted bam files. .
- I tried both the HTseq ( $htseq-count -f bam -m union -s no mapped.sorted.bam -t exon --idattr=transcript_id homo_sapiens.GRCh38.gtf > counts.txt) and featureCounts ($featureCounts -T 10 -s 0 -t exon -g transcript_id -a homo_sapiens.GRCh38.gtf -o counts.txt mapped.sorted.bam) using the following commands but weirdly got zero 0%, alignment in both , suggesting a potential issue with my previous steps I can't seem to figure out.
Any help will be appreciated!
You cannot link images from your local PC, you need to upload them to an image hoster and then paste the full link incl. the suffix (e.g. png) into the field that pops up when using the image button.
As for the question, is this standard RNA-seq or smallRNA-seq?
The is smallRNA-seq data.
My apologies about the images. here are the links to the three images now using googledrive image hoster.
.gtf mapped.sorted.bam featureCounts snapshot
Please use the directions here (rather than google drive links): How to add images to a Biostars post
Thanks GenoMax, I used imgbb as recommended in the link. Please see the links below:
For some reason, the image icon above appeared not to work. Here are the links again:
gtf:
mapped.sorted.bam:
featureCounts output:
You have a mismatch with reference identifiers. These need to match in all locations. In your GTF they are
1
(which I assume is chromosome 1). In your BAM they seem to be justURS*
(hard to tell).Looks like you aligned to transcripts but are trying to use a GTF file from the genome.
@GenoMax, thanks for a prompt response. Still confused, please are you referring to a mismatch in my gtf file (columns 3 & 9) or my .bam file. Please how do I make them match in all locations? The URS* in column 3 of the bam file seems to be the RNAcentral ID and I wonder if that needs to change also. Please what must I do to fix all of these issues? Thanks
As you see in the alignment/annotation snippets below the chromosome (reference name) needs to match for
featureCounts
orhtseq-count
to be able to do its work.You should realign to the genome and then use the genome based GTF file for counting. It would be messy to try and rename your GTF entries.
You may simply be able to run
samtools idxstats
on your BAM and use the counts since you are aligning to small RNA's.@GenoMax, thank you so much for your help. I did not know it was possible to count small RNA sorted bam files directly and easily with samtools idxstats without either HTseq or featurecounts. That was really helpful.
Also, this may sound naive but I wonder what may be the pros and cons of using samtools idxstats as opposed to the "common" HTSeq and featureCounts. Is this applicable to other sequencing data (mRNA-seq or chip-seq).
Presumably my last question: Please what do you think about Bowtie1 random mapping of multi-mapped reads from my initial post: Bowtie1 (bowtie -p 10 -n 2 -k 1 --best -x index -q trimmed.fq.gz -S | samtools view -ShbF4 | samtools sort -o mapped.sorted.bam), intended to randomly assign each of the multi-mapped reads to the best quality region (result:87% with at least 1 unique with only ~5% unique reads). I'm not exactly sure if this (-k 1 --best) is the appropriate way for random assignment of multi-mapped reads but please correct me on this.
Thanks once again!