Question: Raw Read Quality Filtering using FASTX-Toolkit of miRNA-Seq data
0
gravatar for modihardikv
3.2 years ago by
modihardikv0 wrote:

I am analyzing miRNA-Seq data for differential expression analysis of miRNAs. First step in the process, I am performing is raw read quality filtering using FASTX-Toolkit to filter out reads with poor qualities using the following settings:

  1. the minimum quality score for each base = 20;
  2. the percent of bases that must have the minimum quality score ≤ 95%. ( version 0.0.14,http://hannonlab.cshl.edu/fastx_toolkit/index.html)

I used following command to perform quality filtering

fastq_quality_filter -i input -Q 33 -o output.fastq -v -q 20 -p 95

The raw human miRNA sequencing data was downloaded from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60292, which was clean of adapter sequences.

SampleId    TotalReads  TrimmedReads    %OfGoodQualityReadsWithinTotalReads 

SRR1542714  1866654 962422  51.56   %

SRR1542715  1842228 955859  51.89   %

SRR1542716  2777542 1976509 71.16   %

SRR1542717  1324705 318259  24.02   %

SRR1542718  3085962 1830745 59.32   %

SRR1542719  1937831 619794  31.98   %

Usually all these samples should produce >95% of good quality reads after quality filtering. This is a huge variation and seems like I am doing something wrong.

So my question is "Is there any problem in running fastq_quality_filter with this parameter settings?" If not what should be reason I am not able to reproduce the result?

Will be really appreciable if somebody can guide me

rna-seq mirna-seq next-gen • 1.4k views
ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by modihardikv0

Usually all these samples should produce >95% of good quality reads after quality filtering ?

Are you sure about it ? For miRNA expression profiles ? and most important , I see the sequencer is Ion Torrent PGM (Homo sapiens) , I didn't use their data before, I am very interested in it. wait me a week , I will process these data and then answer your question

ADD REPLYlink written 3.2 years ago by jimmy_zeng90

Thanks Jimmy, Please post your answer.

ADD REPLYlink written 3.2 years ago by modihardikv0
1

We got the same results.

But I more concern about the mapping rate :

~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -x miRBase/bowtie2_index/hairpin_human -U SRR1542714_clean.fq.gz -S tmp.sam

1520320 reads; of these: 1520320 (100.00%) were unpaired; of these:

1365259 (89.80%) aligned 0 times

25303 (1.66%) aligned exactly 1 time

129758 (8.53%) aligned >1 times
  

10.20% overall alignment rate

Is that right to align the reads to miRBase ???

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by jimmy_zeng90

there's two things I can make sure , which alignment tool and which reference I should choose ?

belwo is my code :

## step5 : alignment to miRBase v21 by bowtie2 (hairpin.human.fa/mature.human.fa )
## 
mkdir  bowtie2_index &&  cd bowtie2_index
~/biosoft/bowtie/bowtie2-2.2.9/bowtie2-build ../hairpin.human.fa hairpin_human
~/biosoft/bowtie/bowtie2-2.2.9/bowtie2-build ../mature.human.fa  mature_human

ls *_clean.fq.gz | while read id ; do  ~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -x miRBase/bowtie2_index/hairpin_human -U $id   -S ${id%%.*}.hairpin.sam ; done 
## overall alignment rate:  10.20% / 5.71%/ 10.18%/ 4.36% / 10.02% / 4.95%
ls *_clean.fq.gz | while read id ; do  ~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -x miRBase/bowtie2_index/mature_human  -U $id   -S ${id%%.*}.mature.sam ; done 
## overall alignment rate:  6.67% / 3.78% / 6.70% / 2.80%/ 6.55% / 3.23%
ADD REPLYlink modified 3.2 years ago by genomax70k • written 3.2 years ago by jimmy_zeng90
  1. Yes, it is right to use miRBase as reference. However we have decided to use v20..
  2. Are these alignment is on total reads or on collapsed reads?
  3. We are also working on optimizing alignment tool, (Novoaligner,bowtie, bwa). Thanks for your quick response.
ADD REPLYlink written 3.2 years ago by modihardikv0

Use ADD REPLY button below relevant posts to provide additional information. SUBMIT ANSWER should only be used for valid answers for the original question.

ADD REPLYlink written 3.2 years ago by genomax70k

Sorry for my mistatke:

In fact the overall mapping rate should be Ok by using bowties, the only problem is that I forget to chage the U to T in the sequence download from miRBase .

ls _clean.fq.gz | while read id ; do ~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -x miRBase/bowtie2_index/hairpin_human -U $id -S ${id%%.}.hairpin.sam ; done

overall alignment rate: 10.20% / 5.71%/ 10.18%/ 4.36% / 10.02% / 4.95% (before convert U to T )

overall alignment rate: 51.77% / 70.38%/51.45% /61.14%/ 52.20% / 65.85% (after convert U to T )

ADD REPLYlink written 3.1 years ago by jimmy_zeng90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1210 users visited in the last hour