I am working with paired-end fastq.gz files from RNA-seq, each compressed file is around 1 GB. I used Cutadapt 4.6 to trim the Illumina universal adapters from my sample using the command cutadapt -q 20 -a AGATCGGAAGAG -A AGATCGGAAGAG -o R196-11.out.1.fastq.gz -p R196-11.out.2.fastq.gz R1
96-11_S39_L002_R1_001.fastq.gz R196-11_S39_L002_R2_001.fastq.gz
and got the below summary statistics:
Total read pairs processed: 17,584,226
Read 1 with adapter: 13,027,493 (74.1%)
Read 2 with adapter: 12,007,537 (68.3%)
Pairs written (passing filters): 17,584,226 (100.0%)
Total basepairs processed: 5,310,436,252 bp
Read 1: 2,655,218,126 bp
Read 2: 2,655,218,126 bp
Quality-trimmed: 89,487,524 bp (1.7%)
Read 1: 64,207,700 bp
Read 2: 25,279,824 bp
Total written (filtered): 4,162,879,762 bp (78.4%)
Read 1: 2,002,617,872 bp
Read 2: 2,160,261,890 bp
I then tried to use sortmerna on the output files to filter out the rRNA, and got the below trace. The program didn't finish running. I previously ran sortmerna successfully on some smaller test samples (raw FASTQ files that did not have their adapters trimmed), and was able to get a total read count for both EOF FWD and EOF REV. What could be causing sortmerna to not get a read count for EOF REV, and for it to display "EOF FWD reached. Total reads: 1" during the alignment step?
(sortmerna_env) root@WayLT2210:~# sortmerna --ref rRNA_databases_v4/smr_v4.3_default_db.fasta --ref rRNA_databases_v4/smr_v4.3_fast_db.fasta --ref rRNA_databases_v4/smr_v4.3_sensitive_db.fasta --ref rRNA_databases_v4/smr_v4.3_sensitive_db_rfam_seeds.fasta --reads R196-11.out.1.fastq.gz --reads R196-11.out.2.fastq.gz --paired_out --fastx --threads 8
[process:1393] === Options processing starts ... ===
Found value: sortmerna
Found flag: --ref
Found value: rRNA_databases_v4/smr_v4.3_default_db.fasta of previous flag: --ref
Found flag: --ref
Found value: rRNA_databases_v4/smr_v4.3_fast_db.fasta of previous flag: --ref
Found flag: --ref
Found value: rRNA_databases_v4/smr_v4.3_sensitive_db.fasta of previous flag: --ref
Found flag: --ref
Found value: rRNA_databases_v4/smr_v4.3_sensitive_db_rfam_seeds.fasta of previous flag: --ref
Found flag: --reads
Found value: R196-11.out.1.fastq.gz of previous flag: --reads
Found flag: --reads
Found value: R196-11.out.2.fastq.gz of previous flag: --reads
Found flag: --paired_out
Previous flag: --paired_out is Boolean. Setting to True
Found flag: --fastx
Previous flag: --fastx is Boolean. Setting to True
Found flag: --threads
Found value: 8 of previous flag: --threads
[process:1483] Processing option: fastx with value:
[process:1483] Processing option: paired_out with value:
[process:1483] Processing option: reads with value: R196-11.out.1.fastq.gz
[opt_reads:98] Processing reads file [1] out of total [2] files
[process:1483] Processing option: reads with value: R196-11.out.2.fastq.gz
[opt_reads:98] Processing reads file [2] out of total [2] files
[process:1483] Processing option: ref with value: rRNA_databases_v4/smr_v4.3_default_db.fasta
[opt_ref:158] Processing reference [1] out of total [4] references
[opt_ref:206] File "/root/rRNA_databases_v4/smr_v4.3_default_db.fasta" exists and is readable
[process:1483] Processing option: ref with value: rRNA_databases_v4/smr_v4.3_fast_db.fasta
[opt_ref:158] Processing reference [2] out of total [4] references
[opt_ref:206] File "/root/rRNA_databases_v4/smr_v4.3_fast_db.fasta" exists and is readable
[process:1483] Processing option: ref with value: rRNA_databases_v4/smr_v4.3_sensitive_db.fasta
[opt_ref:158] Processing reference [3] out of total [4] references
[opt_ref:206] File "/root/rRNA_databases_v4/smr_v4.3_sensitive_db.fasta" exists and is readable
[process:1483] Processing option: ref with value: rRNA_databases_v4/smr_v4.3_sensitive_db_rfam_seeds.fasta
[opt_ref:158] Processing reference [4] out of total [4] references
[opt_ref:206] File "/root/rRNA_databases_v4/smr_v4.3_sensitive_db_rfam_seeds.fasta" exists and is readable
[process:1483] Processing option: threads with value: 8
[process:1503] === Options processing done ===
[process:1504] Alignment type: [best:1 num_alignments:1 min_lis:2 seeds:2]
[validate_kvdbdir:1242] 'workdir' option was not provided. Using USERDIR to set the working directory: ""
[validate_kvdbdir:1248] Key-value DB location "/root/sortmerna/run/kvdb"
[validate_kvdbdir:1284] Creating KVDB directory: "/root/sortmerna/run/kvdb"
[validate_idxdir:1214] Using index directory: "/root/sortmerna/run/idx"
[validate_idxdir:1230] IDX directory: "/root/sortmerna/run/idx" exists and is not empty
[validate_readb_dir:1306] Using split reads directory : "/root/sortmerna/run/readb"
[validate_readb_dir:1322] split reads directory : "/root/sortmerna/run/readb" exists and is not empty
[main:62] Running command:
sortmerna --ref rRNA_databases_v4/smr_v4.3_default_db.fasta --ref rRNA_databases_v4/smr_v4.3_fast_db.fasta --ref rRNA_databases_v4/smr_v4.3_sensitive_db.fasta --ref rRNA_databases_v4/smr_v4.3_sensitive_db_rfam_seeds.fasta --reads R196-11.out.1.fastq.gz --reads R196-11.out.2.fastq.gz --paired_out --fastx --threads 8
[Index:102] Found 16 non-empty index files. Skipping indexing.
[init:108] Readfeed init started
[define_format:881] file: "R196-11.out.1.fastq.gz" is FASTQ gzipped
[define_format:881] file: "R196-11.out.2.fastq.gz" is FASTQ gzipped
[count_reads:915] started count ...
[next:322] EOF FWD reached. Total reads: 17033568
[count_reads:945] done count. Elapsed time: 88.6886 sec. Total reads: 34067136
[init_split_files:967] added file: /root/sortmerna/run/readb/fwd_0.fq.gz
[init_split_files:967] added file: /root/sortmerna/run/readb/rev_0.fq.gz
[init_split_files:967] added file: /root/sortmerna/run/readb/fwd_1.fq.gz
[init_split_files:967] added file: /root/sortmerna/run/readb/rev_1.fq.gz
[init_split_files:967] added file: /root/sortmerna/run/readb/fwd_2.fq.gz
[init_split_files:967] added file: /root/sortmerna/run/readb/rev_2.fq.gz
[init_split_files:967] added file: /root/sortmerna/run/readb/fwd_3.fq.gz
[init_split_files:967] added file: /root/sortmerna/run/readb/rev_3.fq.gz
[init_split_files:967] added file: /root/sortmerna/run/readb/fwd_4.fq.gz
[init_split_files:967] added file: /root/sortmerna/run/readb/rev_4.fq.gz
[init_split_files:967] added file: /root/sortmerna/run/readb/fwd_5.fq.gz
[init_split_files:967] added file: /root/sortmerna/run/readb/rev_5.fq.gz
[init_split_files:967] added file: /root/sortmerna/run/readb/fwd_6.fq.gz
[init_split_files:967] added file: /root/sortmerna/run/readb/rev_6.fq.gz
[init_split_files:967] added file: /root/sortmerna/run/readb/fwd_7.fq.gz
[init_split_files:967] added file: /root/sortmerna/run/readb/rev_7.fq.gz
[is_split_ready:726] found existing readfeed descriptor /root/sortmerna/run/readb/readfeed
[split:605] start splitting. Using number of splits equals number of processing threads: 8
[clean:1098] found descriptor /root/sortmerna/run/readb/readfeed
[clean:1142] removing split file: /root/sortmerna/run/readb/fwd_0.fq.gz
[clean:1142] removing split file: /root/sortmerna/run/readb/rev_0.fq.gz
[clean:1142] removing split file: /root/sortmerna/run/readb/fwd_1.fq.gz
[clean:1142] removing split file: /root/sortmerna/run/readb/rev_1.fq.gz
[clean:1142] removing split file: /root/sortmerna/run/readb/fwd_2.fq.gz
[clean:1142] removing split file: /root/sortmerna/run/readb/rev_2.fq.gz
[clean:1142] removing split file: /root/sortmerna/run/readb/fwd_3.fq.gz
[clean:1142] removing split file: /root/sortmerna/run/readb/rev_3.fq.gz
[clean:1142] removing split file: /root/sortmerna/run/readb/fwd_4.fq.gz
[clean:1142] removing split file: /root/sortmerna/run/readb/rev_4.fq.gz
[clean:1142] removing split file: /root/sortmerna/run/readb/fwd_5.fq.gz
[clean:1142] removing split file: /root/sortmerna/run/readb/rev_5.fq.gz
[clean:1142] removing split file: /root/sortmerna/run/readb/fwd_6.fq.gz
[clean:1142] removing split file: /root/sortmerna/run/readb/rev_6.fq.gz
[clean:1142] removing split file: /root/sortmerna/run/readb/fwd_7.fq.gz
[clean:1142] removing split file: /root/sortmerna/run/readb/rev_7.fq.gz
[next:322] EOF FWD reached. Total reads: 17033568
[split:717] Done splitting. Reads count: 34067136 Runtime sec: 1043.51
[init:135] Readfeed init done in sec [1132.2]
[store_to_db:292] Stored Reads statistics to DB:
all_reads_count= 34067136 all_reads_len= 4098461419 min_read_len= 1 max_read_len= 151 total_aligned= 0 total_aligned_id= 0 total_aligned_cov= 0 total_aligned_id_cov= 0 total_denovo= 0 num_short= 0 reads_matched_per_db= TODO is_stats_calc= 0 is_total_reads_mapped_cov= 0
[align:143] ==== Starting alignment ====
[align:146] Number of cores: 12
[align:163] Using number of Processor threads: 8
[Refstats:60] Index Statistics calculation starts ... done in: 1.14442 sec
[align:185] Loading index: 0 part: 1/1 Memory KB: 19 ...
[align:190] done in [6.53606] sec Memory KB: 3197
[align:193] Loading references ...
[align:197] done in [0.525161] sec. Memory KB: 3351
[align2:70] Processor 1 thread 140642311198464 started
[align2:70] Processor 2 thread 140642319591168 started
[align2:70] Processor 0 thread 140642302805760 started
[align2:70] Processor 4 thread 140642327983872 started
[align2:70] Processor 6 thread 140641682933504 started
[align2:70] Processor 7 thread 140641674540800 started
[align2:70] Processor 5 thread 140641691326208 started
[align2:70] Processor 3 thread 140642764175104 started
[next:455] EOF FWD reached. Total reads: 1
[next:455] EOF FWD reached. Total reads: 1
[next:455] EOF FWD reached. Total reads: 1
[next:455] EOF FWD reached. Total reads: 1
[next:455] EOF REV reached. Total reads: 1
[align2:133] Processor 4 thread 140642327983872 done. Processed 0 reads. Skipped already processed: 0 reads Aligned reads (passing E-value): 0 Runtime sec: 23.7849
[align2:133] Processor 2 thread 140642319591168 done. Processed 0 reads. Skipped already processed: 0 reads Aligned reads (passing E-value): 0 Runtime sec: 23.869
[align2:133] Processor 3 thread 140642764175104 done. Processed 0 reads. Skipped already processed: 0 reads Aligned reads (passing E-value): 0 Runtime sec: 24.0053
[align2:133] Processor 7 thread 140641674540800 done. Processed 0 reads. Skipped already processed: 0 reads Aligned reads (passing E-value): 0 Runtime sec: 24.0421
[align2:133] Processor 1 thread 140642311198464 done. Processed 1 reads. Skipped already processed: 0 reads Aligned reads (passing E-value): 1 Runtime sec: 24.2443
Can you try to validate that you have non-corrupt fastq files by using one of the tools mentioned here: Checking fastq is valid
I downloaded fq lint and ran
fq lint R196-11.out.1.fastq.gz R196-11.out.2.fastq.gz
, and got the following: