How to calculate the percentage of a repetitive motif in Fastq file

0

Entering edit mode

3.9 years ago

lkianmehr ▴ 100

I need to calculate the percentage of a repetitive motif (TTAGGG/CCCTAA) in my RNA-seq data (Fastq files). I used BBDUK (this command :

bbduk.sh in1=R3_L002_1.fastq.gz in2=R3_L002_2.fastq.gz literal=TAACCCTAACCCTAACCCTAACCC k=24 mm=f int=f).

below is an output that I've got.

D1_L001:

Input:                      65975862 reads      6554014910 bases.
Contaminants:               195232 reads (0.30%)    19519262 bases (0.30%)
Total Removed:              1040136 reads (1.58%)   61775988 bases (0.94%)
Result:                     64935726 reads (98.42%)     6492238922 bases (99.06%)

Time:                           133.851 seconds.
Reads Processed:      65975k    492.91k reads/sec
Bases Processed:       6554m    48.97m bases/sec

total removed literal sequences are regarded as the percentage of that repetitive sequences in fastq files. Is that a reliable method? if so is there any normalization method required for comparison with others?

Thank you for your help

RNA-Seq sequence fastq bbtools • 832 views

ADD COMMENT • link updated 3.9 years ago by Asaf 10k • written 3.9 years ago by lkianmehr ▴ 100

0

Entering edit mode

Try seqkit locate function. If you can post input and expected output, some one here can help you out.

ADD REPLY • link 3.9 years ago by cpad0112 21k

Login before adding your answer.