How to calculate the percentage of a repetitive motif in Fastq file
0
0
Entering edit mode
3.9 years ago
lkianmehr ▴ 100

I need to calculate the percentage of a repetitive motif (TTAGGG/CCCTAA) in my RNA-seq data (Fastq files). I used BBDUK (this command :

bbduk.sh in1=R3_L002_1.fastq.gz in2=R3_L002_2.fastq.gz literal=TAACCCTAACCCTAACCCTAACCC k=24 mm=f int=f).

below is an output that I've got.

D1_L001:

Input:                      65975862 reads      6554014910 bases.
Contaminants:               195232 reads (0.30%)    19519262 bases (0.30%)
Total Removed:              1040136 reads (1.58%)   61775988 bases (0.94%)
Result:                     64935726 reads (98.42%)     6492238922 bases (99.06%)

Time:                           133.851 seconds.
Reads Processed:      65975k    492.91k reads/sec
Bases Processed:       6554m    48.97m bases/sec

total removed literal sequences are regarded as the percentage of that repetitive sequences in fastq files. Is that a reliable method? if so is there any normalization method required for comparison with others?

Thank you for your help

RNA-Seq sequence fastq bbtools • 832 views
ADD COMMENT
0
Entering edit mode

Try seqkit locate function. If you can post input and expected output, some one here can help you out.

ADD REPLY

Login before adding your answer.

Traffic: 1858 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6