Very low number of micro RNA reads
0
0
Entering edit mode
8 weeks ago
aranyak111 • 0

I am new to microRNA analysis. I have got sequences Homo_sapiens.GRCh38.miRNA.fasta representing the latest build of genome index few of whose first lines look like this.

 >hsa-let-7a-5p MIMAT0000062 Homo sapiens let-7a-5p UGAGGUAGUAGGUUGUAUAGUU
 >hsa-let-7a-3p MIMAT0004481 Homo sapiens let-7a-3p CUAUACAAUCUACUGUCUUUC
 >hsa-let-7a-2-3p MIMAT0010195 Homo sapiens let-7a-2-3p CUGUACAGCCUCCUAGCUUUCC
 >hsa-let-7b-5p MIMAT0000063 Homo sapiens let-7b-5p UGAGGUAGUAGGUUGUGUGGUU
 >hsa-let-7b-3p MIMAT0004482 Homo sapiens let-7b-3p CUAUACAACCUACUGCCUUCCC

I converted all the Us to Ts using sed and awk scripts

The first few lines of the converted file look like this.

hsa-let-7a-5p MIMAT0000062 Homo sapiens let-7a-5p TGAGGTAGTAGGTTGTATAGTT hsa-let-7a-3p MIMAT0004481 Homo sapiens let-7a-3p CTATACAATCTACTGTCTTTC.

I have built the reference genome based on the bowtie script

/gpfs/ycga/scratch60/nicoli/ag2646/Bowtie
2.4.2/bowtie2-2.4.2-linux-x86_64/bowtie2-build Homo_sappeins.GRCH38_DNA.fasta Homo_sapiens_microRNA_sub_ref

I now want to align my trimmed FASTQ file of human microRNAs to this indexed genome . The first few lines of my trimmed FASTQ file looks like this

@SRR8248787.1601 HWI-D00306:1090:HKVGMBCX2:1:1101:8925:2469/1 TCACAGTGAACCGGTCTCTTTAGATCGGAAGATCACACGTCT
+ DDADDHHHEDF<E?CHHIEHIIIIFEE@ECCG@GHIHHD<<G @SRR8248787.1602 HWI-D00306:1090:HKVGMBCX2:1:1101:9104:2269/1 TGAGGTAGTAGTTTGTGCTGTTAGATCGGAAGAGCACACGTCT
+ DDDDDIIIIIIIHIIIHIIIIIHIHIIIIIIIHIIIIIIHHII @SRR8248787.1603 HWI-D00306:1090:HKVGMBCX2:1:1101:9222:2274/1

I am using Bowtie2 default scripts to align the reads that have been trimmed using BBDuk

The bowtie scripts look like this

module load  Bowtie2; bowtie2  -x Homo_sapiens_microRNA_sub_ref -U /gpfs/ycga/scratch60/nicoli/ag2646/HumanmicroRNAbowtie_verysensitive_1.06.2021./3kPa_HDF_miRNA_1.bbdukfastq
-S /gpfs/ycga/scratch60/nicoli/ag2646/HumanmicroRNAbowtie_verysensitive_1.06.2021./3kPa_HDF_miRNA_1_Gr38trimmed.sam

The log file that I get after alignment is

29980128 reads; of these:
  29980128 (100.00%) were unpaired; of these:
    29979332 (100.00%) aligned 0 times
    546 (0.00%) aligned exactly 1 time
    250 (0.00%) aligned >1 times
0.00% overall alignment rate

The scripts used to do adapter trimming was

bbduk.sh in=3kPa_HDF_miRNA_1.fastq out=3kPa_HDF_miRNA_1bbduk.fastq ref=polyA.fa.gz,truseq_rna.fa.gz k=13 ktrim=r useshortkmers=t mink=5 qtrim=r trimq=10 minlength

Any help will be useful.

Genomics • 248 views
ADD COMMENT
0
Entering edit mode

I've cleaned up your post this time, but please put some effort into formatting your posts better. I am sure this has been requested of you multiple times already.

ADD REPLY
0
Entering edit mode

Simply using bbduk.sh may not be enough since this data may have a specific adapter on 3'-end that you will need to remove.

You should use bowtie v.1.x since with miRNA you need to do ungapped alignments.

Please use a pipeline such as https://scilifelab.github.io/courses/rnaseq/labs/smallRNA-lab

ADD REPLY
0
Entering edit mode

i would align to the mirbase hairpins fasta. then do some manual alignment just using your text editor as a positive control, or sneak a sequence you know is a perfect match into your query. then do your bowtie alignment and get back to us.

ADD REPLY

Login before adding your answer.

Traffic: 1393 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6