Question

Raw Read Quality Filtering using FASTX-Toolkit of miRNA-Seq data

1

Entering edit mode

9.1 years ago

modihardikv ▴ 10

I am analyzing miRNA-Seq data for differential expression analysis of miRNAs. First step in the process, I am performing is raw read quality filtering using FASTX-Toolkit to filter out reads with poor qualities using the following settings:

the minimum quality score for each base = 20;
the percent of bases that must have the minimum quality score ≤ 95%. ( version 0.0.14,http://hannonlab.cshl.edu/fastx_toolkit/index.html)

I used following command to perform quality filtering

fastq_quality_filter -i input -Q 33 -o output.fastq -v -q 20 -p 95

The raw human miRNA sequencing data was downloaded from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60292, which was clean of adapter sequences.

SampleId    TotalReads  TrimmedReads    %OfGoodQualityReadsWithinTotalReads 

SRR1542714  1866654 962422  51.56   %

SRR1542715  1842228 955859  51.89   %

SRR1542716  2777542 1976509 71.16   %

SRR1542717  1324705 318259  24.02   %

SRR1542718  3085962 1830745 59.32   %

SRR1542719  1937831 619794  31.98   %

Usually all these samples should produce >95% of good quality reads after quality filtering. This is a huge variation and seems like I am doing something wrong.

So my question is "Is there any problem in running fastq_quality_filter with this parameter settings?" If not what should be reason I am not able to reproduce the result?

Will be really appreciable if somebody can guide me

miRNA-Seq RNA-Seq next-gen • 3.8k views

ADD COMMENT • link 9.0 years ago by modihardikv ▴ 10

0

Entering edit mode

Usually all these samples should produce >95% of good quality reads after quality filtering ?

Are you sure about it ? For miRNA expression profiles ? and most important , I see the sequencer is Ion Torrent PGM (Homo sapiens) , I didn't use their data before, I am very interested in it. wait me a week , I will process these data and then answer your question

ADD REPLY • link 9.1 years ago by jimmy_zeng ▴ 90

0

Entering edit mode

Thanks Jimmy, Please post your answer.

ADD REPLY • link 9.1 years ago by modihardikv ▴ 10

1

Entering edit mode

We got the same results.

But I more concern about the mapping rate :

~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -x miRBase/bowtie2_index/hairpin_human -U SRR1542714_clean.fq.gz -S tmp.sam

1520320 reads; of these: 1520320 (100.00%) were unpaired; of these:
1365259 (89.80%) aligned 0 times

25303 (1.66%) aligned exactly 1 time

129758 (8.53%) aligned >1 times
  
10.20% overall alignment rate

Is that right to align the reads to miRBase ???

ADD REPLY • link 9.1 years ago by jimmy_zeng ▴ 90

0

Entering edit mode

there's two things I can make sure , which alignment tool and which reference I should choose ?

belwo is my code :

## step5 : alignment to miRBase v21 by bowtie2 (hairpin.human.fa/mature.human.fa )
## 
mkdir  bowtie2_index &&  cd bowtie2_index
~/biosoft/bowtie/bowtie2-2.2.9/bowtie2-build ../hairpin.human.fa hairpin_human
~/biosoft/bowtie/bowtie2-2.2.9/bowtie2-build ../mature.human.fa  mature_human

ls *_clean.fq.gz | while read id ; do  ~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -x miRBase/bowtie2_index/hairpin_human -U $id   -S ${id%%.*}.hairpin.sam ; done 
## overall alignment rate:  10.20% / 5.71%/ 10.18%/ 4.36% / 10.02% / 4.95%
ls *_clean.fq.gz | while read id ; do  ~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -x miRBase/bowtie2_index/mature_human  -U $id   -S ${id%%.*}.mature.sam ; done 
## overall alignment rate:  6.67% / 3.78% / 6.70% / 2.80%/ 6.55% / 3.23%

ADD REPLY • link updated 9.0 years ago by GenoMax 152k • written 9.1 years ago by jimmy_zeng ▴ 90

0

Entering edit mode

Yes, it is right to use miRBase as reference. However we have decided to use v20..
Are these alignment is on total reads or on collapsed reads?
We are also working on optimizing alignment tool, (Novoaligner,bowtie, bwa). Thanks for your quick response.

ADD REPLY • link 9.0 years ago by modihardikv ▴ 10

0

Entering edit mode

Use ADD REPLY button below relevant posts to provide additional information. SUBMIT ANSWER should only be used for valid answers for the original question.

ADD REPLY • link 9.0 years ago by GenoMax 152k

0

Entering edit mode

Sorry for my mistatke:

In fact the overall mapping rate should be Ok by using bowties, the only problem is that I forget to chage the U to T in the sequence download from miRBase .

ls _clean.fq.gz | while read id ; do ~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -x miRBase/bowtie2_index/hairpin_human -U $id -S ${id%%.}.hairpin.sam ; done

overall alignment rate: 10.20% / 5.71%/ 10.18%/ 4.36% / 10.02% / 4.95% (before convert U to T )

overall alignment rate: 51.77% / 70.38%/51.45% /61.14%/ 52.20% / 65.85% (after convert U to T )

ADD REPLY • link 9.0 years ago by jimmy_zeng ▴ 90