Recently, I have read a paper (Tiedt, S., et al. (2017). RNA-seq identifies circulating miR-125a-5p, miR-125b-5p and miR-143-3p as potential biomarkers for acute Ischemic stroke. Circulation research, CIRCRESAHA-117). Some detail of this paper is followed: PMID: 28724745 DOI: 10.1161/CIRCRESAHA.117.311572 Pubmed GEO database: SRA: SRP133275
I wanted to get the expression matrix of miRNA after stroke in human circulating blood. I got these files (SRA format) from Pubmed GEO database. Trimmed them with Trimmomatic software, and used the Hisat2 software to align the reads to the genome. However, the alignment is too low as followed.
6644136 reads; of these: 6644136 (100.00%) were unpaired; of these: 6631500 (99.81%) aligned 0 times 2981 (0.04%) aligned exactly 1 time 9655 (0.15%) aligned >1 times 0.19% overall alignment rate
Here is the shell script:
hisat2 -p 4 --dta -x ./indexes/genome_tran -U ./samples/ SRR6761159.fastq -S ./temp/ SRR6761159.sam
The indexes file is “genome_tran.[1-8].ht2”.
The alignment is too low. Does anyone have any suggestions on how to address this problem? Thank you.
If this is
miRNAdata then you should not be using HISAT2 for alignments. You would want ungapped alignments and
bowtie v.1would be more appropriate.
Thank you. I will try it now.
Hi, genomax. Thanks for your help. Last two days, I used the bowtie software, but it was still similar to the above result. Was the index file appropriate? It was downloaded from ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/GRCh38_no_alt.zip Is there an index file specially for miRNA alignment? I tried the miRbase website, downloaded the hairpin.fa and mature.fa files, built them to *.ebwt format index files with bowtie-build. Then I repeated the alignment process again. It didn’t work either. I think this may be because these .fa format files are not human reference files.
Have you cleaned the data downloaded from SRA? It may still have adapter sequences in it. Sequence data from miRBase generally has
Ubases which have to be changed to
Tbefore you can do the alignments. Have you done that before creating the bowtie indexes?
Hi, genomax. Thanks for your help. Do you know how to get adapter sequences? I got some from the fastqc result report file, using trimmomatic to trim reads, aligning the trimmed reads to genome. Here is the report.
The alignment rate increased, but it was still too low.
You know that a .gtf file is not a genome, right?
Thank you. That is a typo. I have corrected it. The indexes file is ht2 format file.
99 % aligned zero times (not at all aligned) ? How much sure are you about the data?
This result was given by hisat2. It seems that these reads were not mapped to the genome at all. I am not sure about the data. These files are downloaded from pubmed GEO database.
Which Hisat2 version are you using?
HISAT2 version 2.1.0 by Daehwan Kim (firstname.lastname@example.org, www.ccb.jhu.edu/people/infphilo) Operate system: Manjaro Linux 64 bit. This software works fine when I process another RNAseq data of mice with indexes files for mice.
That sounds really strange. Now I almost think there is a mixup with the samples. Could you try selecting a couple of random reads and blasting the first say 30-40 nucleotides (online tool here) just to make sure there is not a mixup with the samples?
Have you tried using a mir reference? i.e. a reference only containing the mirs sequences? You can use MiRbase download to get this data: http://www.mirbase.org/ftp.shtml