Entering edit mode
3.9 years ago
lkianmehr
▴
100
I need to calculate the percentage of a repetitive motif (TTAGGG/CCCTAA) in my RNA-seq data (Fastq files). I used BBDUK (this command :
bbduk.sh in1=R3_L002_1.fastq.gz in2=R3_L002_2.fastq.gz literal=TAACCCTAACCCTAACCCTAACCC k=24 mm=f int=f).
below is an output that I've got.
D1_L001:
Input: 65975862 reads 6554014910 bases.
Contaminants: 195232 reads (0.30%) 19519262 bases (0.30%)
Total Removed: 1040136 reads (1.58%) 61775988 bases (0.94%)
Result: 64935726 reads (98.42%) 6492238922 bases (99.06%)
Time: 133.851 seconds.
Reads Processed: 65975k 492.91k reads/sec
Bases Processed: 6554m 48.97m bases/sec
total removed literal sequences are regarded as the percentage of that repetitive sequences in fastq files. Is that a reliable method? if so is there any normalization method required for comparison with others?
Thank you for your help
Try seqkit locate function. If you can post input and expected output, some one here can help you out.