Question

sortmerna not counting the EOF REV file, not working on paired end reads

0

Entering edit mode

16 months ago

jway • 0

I am working with paired-end fastq.gz files from RNA-seq, each compressed file is around 1 GB. I used Cutadapt 4.6 to trim the Illumina universal adapters from my sample using the command cutadapt -q 20 -a AGATCGGAAGAG -A AGATCGGAAGAG -o R196-11.out.1.fastq.gz -p R196-11.out.2.fastq.gz R1 96-11_S39_L002_R1_001.fastq.gz R196-11_S39_L002_R2_001.fastq.gz and got the below summary statistics:

Total read pairs processed:         17,584,226
  Read 1 with adapter:              13,027,493 (74.1%)
  Read 2 with adapter:              12,007,537 (68.3%)
Pairs written (passing filters):    17,584,226 (100.0%)

Total basepairs processed: 5,310,436,252 bp
  Read 1: 2,655,218,126 bp
  Read 2: 2,655,218,126 bp
Quality-trimmed:              89,487,524 bp (1.7%)
  Read 1:    64,207,700 bp
  Read 2:    25,279,824 bp
Total written (filtered):  4,162,879,762 bp (78.4%)
  Read 1: 2,002,617,872 bp
  Read 2: 2,160,261,890 bp

I then tried to use sortmerna on the output files to filter out the rRNA, and got the below trace. The program didn't finish running. I previously ran sortmerna successfully on some smaller test samples (raw FASTQ files that did not have their adapters trimmed), and was able to get a total read count for both EOF FWD and EOF REV. What could be causing sortmerna to not get a read count for EOF REV, and for it to display "EOF FWD reached. Total reads: 1" during the alignment step?

(sortmerna_env) root@WayLT2210:~# sortmerna --ref rRNA_databases_v4/smr_v4.3_default_db.fasta --ref rRNA_databases_v4/smr_v4.3_fast_db.fasta --ref rRNA_databases_v4/smr_v4.3_sensitive_db.fasta --ref rRNA_databases_v4/smr_v4.3_sensitive_db_rfam_seeds.fasta --reads R196-11.out.1.fastq.gz --reads R196-11.out.2.fastq.gz --paired_out --fastx --threads 8

[process:1393] === Options processing starts ... ===

Found value: sortmerna

Found flag: --ref

Found value: rRNA_databases_v4/smr_v4.3_default_db.fasta of previous flag: --ref

Found flag: --ref

Found value: rRNA_databases_v4/smr_v4.3_fast_db.fasta of previous flag: --ref

Found flag: --ref

Found value: rRNA_databases_v4/smr_v4.3_sensitive_db.fasta of previous flag: --ref

Found flag: --ref

Found value: rRNA_databases_v4/smr_v4.3_sensitive_db_rfam_seeds.fasta of previous flag: --ref

Found flag: --reads

Found value: R196-11.out.1.fastq.gz of previous flag: --reads

Found flag: --reads

Found value: R196-11.out.2.fastq.gz of previous flag: --reads

Found flag: --paired_out

Previous flag: --paired_out is Boolean. Setting to True

Found flag: --fastx

Previous flag: --fastx is Boolean. Setting to True

Found flag: --threads

Found value: 8 of previous flag: --threads

[process:1483] Processing option: fastx with value:

[process:1483] Processing option: paired_out with value:

[process:1483] Processing option: reads with value: R196-11.out.1.fastq.gz

[opt_reads:98] Processing reads file [1] out of total [2] files

[process:1483] Processing option: reads with value: R196-11.out.2.fastq.gz

[opt_reads:98] Processing reads file [2] out of total [2] files

[process:1483] Processing option: ref with value: rRNA_databases_v4/smr_v4.3_default_db.fasta

[opt_ref:158] Processing reference [1] out of total [4] references

[opt_ref:206] File "/root/rRNA_databases_v4/smr_v4.3_default_db.fasta" exists and is readable

[process:1483] Processing option: ref with value: rRNA_databases_v4/smr_v4.3_fast_db.fasta

[opt_ref:158] Processing reference [2] out of total [4] references

[opt_ref:206] File "/root/rRNA_databases_v4/smr_v4.3_fast_db.fasta" exists and is readable

[process:1483] Processing option: ref with value: rRNA_databases_v4/smr_v4.3_sensitive_db.fasta

[opt_ref:158] Processing reference [3] out of total [4] references

[opt_ref:206] File "/root/rRNA_databases_v4/smr_v4.3_sensitive_db.fasta" exists and is readable

[process:1483] Processing option: ref with value: rRNA_databases_v4/smr_v4.3_sensitive_db_rfam_seeds.fasta

[opt_ref:158] Processing reference [4] out of total [4] references

[opt_ref:206] File "/root/rRNA_databases_v4/smr_v4.3_sensitive_db_rfam_seeds.fasta" exists and is readable

[process:1483] Processing option: threads with value: 8

[process:1503] === Options processing done ===

[process:1504] Alignment type: [best:1 num_alignments:1 min_lis:2 seeds:2]

[validate_kvdbdir:1242] 'workdir' option was not provided. Using USERDIR to set the working directory: ""

[validate_kvdbdir:1248] Key-value DB location "/root/sortmerna/run/kvdb"

[validate_kvdbdir:1284] Creating KVDB directory: "/root/sortmerna/run/kvdb"

[validate_idxdir:1214] Using index directory: "/root/sortmerna/run/idx"

[validate_idxdir:1230] IDX directory: "/root/sortmerna/run/idx" exists and is not empty

[validate_readb_dir:1306] Using split reads directory : "/root/sortmerna/run/readb"

[validate_readb_dir:1322] split reads directory : "/root/sortmerna/run/readb" exists and is not empty

[main:62] Running command:

sortmerna --ref rRNA_databases_v4/smr_v4.3_default_db.fasta --ref rRNA_databases_v4/smr_v4.3_fast_db.fasta --ref rRNA_databases_v4/smr_v4.3_sensitive_db.fasta --ref rRNA_databases_v4/smr_v4.3_sensitive_db_rfam_seeds.fasta --reads R196-11.out.1.fastq.gz --reads R196-11.out.2.fastq.gz --paired_out --fastx --threads 8

[Index:102] Found 16 non-empty index files. Skipping indexing.

[init:108] Readfeed init started

[define_format:881] file: "R196-11.out.1.fastq.gz" is FASTQ gzipped

[define_format:881] file: "R196-11.out.2.fastq.gz" is FASTQ gzipped

[count_reads:915] started count  ...

[next:322] EOF FWD reached. Total reads: 17033568

[count_reads:945] done count. Elapsed time: 88.6886 sec. Total reads: 34067136

[init_split_files:967] added file: /root/sortmerna/run/readb/fwd_0.fq.gz

[init_split_files:967] added file: /root/sortmerna/run/readb/rev_0.fq.gz

[init_split_files:967] added file: /root/sortmerna/run/readb/fwd_1.fq.gz

[init_split_files:967] added file: /root/sortmerna/run/readb/rev_1.fq.gz

[init_split_files:967] added file: /root/sortmerna/run/readb/fwd_2.fq.gz

[init_split_files:967] added file: /root/sortmerna/run/readb/rev_2.fq.gz

[init_split_files:967] added file: /root/sortmerna/run/readb/fwd_3.fq.gz

[init_split_files:967] added file: /root/sortmerna/run/readb/rev_3.fq.gz

[init_split_files:967] added file: /root/sortmerna/run/readb/fwd_4.fq.gz

[init_split_files:967] added file: /root/sortmerna/run/readb/rev_4.fq.gz

[init_split_files:967] added file: /root/sortmerna/run/readb/fwd_5.fq.gz

[init_split_files:967] added file: /root/sortmerna/run/readb/rev_5.fq.gz

[init_split_files:967] added file: /root/sortmerna/run/readb/fwd_6.fq.gz

[init_split_files:967] added file: /root/sortmerna/run/readb/rev_6.fq.gz

[init_split_files:967] added file: /root/sortmerna/run/readb/fwd_7.fq.gz

[init_split_files:967] added file: /root/sortmerna/run/readb/rev_7.fq.gz

[is_split_ready:726] found existing readfeed descriptor /root/sortmerna/run/readb/readfeed

[split:605] start splitting. Using number of splits equals number of processing threads: 8

[clean:1098] found descriptor /root/sortmerna/run/readb/readfeed

[clean:1142] removing split file: /root/sortmerna/run/readb/fwd_0.fq.gz

[clean:1142] removing split file: /root/sortmerna/run/readb/rev_0.fq.gz

[clean:1142] removing split file: /root/sortmerna/run/readb/fwd_1.fq.gz

[clean:1142] removing split file: /root/sortmerna/run/readb/rev_1.fq.gz

[clean:1142] removing split file: /root/sortmerna/run/readb/fwd_2.fq.gz

[clean:1142] removing split file: /root/sortmerna/run/readb/rev_2.fq.gz

[clean:1142] removing split file: /root/sortmerna/run/readb/fwd_3.fq.gz

[clean:1142] removing split file: /root/sortmerna/run/readb/rev_3.fq.gz

[clean:1142] removing split file: /root/sortmerna/run/readb/fwd_4.fq.gz

[clean:1142] removing split file: /root/sortmerna/run/readb/rev_4.fq.gz

[clean:1142] removing split file: /root/sortmerna/run/readb/fwd_5.fq.gz

[clean:1142] removing split file: /root/sortmerna/run/readb/rev_5.fq.gz

[clean:1142] removing split file: /root/sortmerna/run/readb/fwd_6.fq.gz

[clean:1142] removing split file: /root/sortmerna/run/readb/rev_6.fq.gz

[clean:1142] removing split file: /root/sortmerna/run/readb/fwd_7.fq.gz

[clean:1142] removing split file: /root/sortmerna/run/readb/rev_7.fq.gz

[next:322] EOF FWD reached. Total reads: 17033568

[split:717] Done splitting. Reads count: 34067136 Runtime sec: 1043.51



[init:135] Readfeed init done in sec [1132.2]

[store_to_db:292] Stored Reads statistics to DB:

all_reads_count= 34067136 all_reads_len= 4098461419 min_read_len= 1 max_read_len= 151 total_aligned= 0 total_aligned_id= 0 total_aligned_cov= 0 total_aligned_id_cov= 0 total_denovo= 0 num_short= 0 reads_matched_per_db= TODO is_stats_calc= 0 is_total_reads_mapped_cov= 0

[align:143] ==== Starting alignment ====

[align:146] Number of cores: 12

[align:163] Using number of Processor threads: 8

[Refstats:60] Index Statistics calculation starts ... done in: 1.14442 sec

[align:185] Loading index: 0 part: 1/1 Memory KB: 19 ...

[align:190] done in [6.53606] sec Memory KB: 3197

[align:193] Loading references ...

[align:197] done in [0.525161] sec. Memory KB: 3351

[align2:70] Processor 1 thread 140642311198464 started

[align2:70] Processor 2 thread 140642319591168 started

[align2:70] Processor 0 thread 140642302805760 started

[align2:70] Processor 4 thread 140642327983872 started

[align2:70] Processor 6 thread 140641682933504 started

[align2:70] Processor 7 thread 140641674540800 started

[align2:70] Processor 5 thread 140641691326208 started

[align2:70] Processor 3 thread 140642764175104 started

[next:455] EOF FWD reached. Total reads: 1

[next:455] EOF FWD reached. Total reads: 1

[next:455] EOF FWD reached. Total reads: 1

[next:455] EOF FWD reached. Total reads: 1

[next:455] EOF REV reached. Total reads: 1

[align2:133] Processor 4 thread 140642327983872 done. Processed 0 reads. Skipped already processed: 0 reads Aligned reads (passing E-value): 0 Runtime sec: 23.7849

[align2:133] Processor 2 thread 140642319591168 done. Processed 0 reads. Skipped already processed: 0 reads Aligned reads (passing E-value): 0 Runtime sec: 23.869

[align2:133] Processor 3 thread 140642764175104 done. Processed 0 reads. Skipped already processed: 0 reads Aligned reads (passing E-value): 0 Runtime sec: 24.0053

[align2:133] Processor 7 thread 140641674540800 done. Processed 0 reads. Skipped already processed: 0 reads Aligned reads (passing E-value): 0 Runtime sec: 24.0421

[align2:133] Processor 1 thread 140642311198464 done. Processed 1 reads. Skipped already processed: 0 reads Aligned reads (passing E-value): 1 Runtime sec: 24.2443

sortmerna cutadapt • 1.3k views

ADD COMMENT • link updated 16 months ago by GenoMax 152k • written 16 months ago by jway • 0

0

Entering edit mode

Can you try to validate that you have non-corrupt fastq files by using one of the tools mentioned here: Checking fastq is valid

ADD REPLY • link 16 months ago by GenoMax 152k

0

Entering edit mode

I downloaded fq lint and ran fq lint R196-11.out.1.fastq.gz R196-11.out.2.fastq.gz, and got the following:

2024-03-01T23:15:11.373488Z  INFO fq::commands::lint: fq-lint start

2024-03-01T23:15:11.373543Z  INFO fq::commands::lint: validating paired end reads

2024-03-01T23:15:11.373566Z  INFO fq::validators: disabled validators: []

2024-03-01T23:15:11.373569Z  INFO fq::validators: enabled single read validators: ["[S003] NameValidator", "[S004] CompleteValidator", "[S002] AlphabetValidator", "[S001] PlusLineValidator", "[S005] ConsistentSeqQualValidator", "[S006] QualityStringValidator"]

2024-03-01T23:15:11.373577Z  INFO fq::validators: enabled paired read validators: ["[P001] NamesValidator"]

2024-03-01T23:15:11.373584Z  INFO fq::commands::lint: enabled special validators: ["[S007] DuplicateNameValidator"]

2024-03-01T23:15:11.373585Z  INFO fq::commands::lint: starting validation (pass 1)
R196-11.out.1.fastq.gz:22:1: [S004] CompleteValidator: empty sequence

ADD REPLY • link updated 16 months ago by GenoMax 152k • written 16 months ago by jway • 0

GenoMax · Answer 1 · 2024-03-01

0

Entering edit mode

16 months ago

GenoMax 152k

Looks like one of the reads was completely trimmed but the 0 length fastq record was left in place. I am not a cutadapt user but you should look for and add an option to make sure reads remain a certain length (say 25 bp, smaller reads are not useful for alignments) and discard those that fail the criteria.

With bbduk.sh you would normally do this with minlength=25 option. If one read fails this criterial then it and its mate will be removed from the files.

ADD COMMENT • link 16 months ago by GenoMax 152k

0

Entering edit mode

Thanks for the help, I ran cutadapt -q 15 -a AGATCGGAAGAG -A AGATCGGAAGAG -m 35 -o R196-11.out.1.fastq.gz -p R196-11.out.2.fastq .gz R196-11_S39_L002_R1_001.fastq.gz R196-11_S39_L002_R2_001.fastq.gz to remove reads less than 35 bp long and I think it worked, I got the following after running fq lint again:

 (base) root@WayLT2210:~# fq lint R196-11.out.1.fastq.gz R196-11.out.2.fastq.gz

2024-03-04T18:32:01.759423Z  INFO fq::commands::lint: fq-lint start

2024-03-04T18:32:01.759676Z  INFO fq::commands::lint: validating paired end reads

2024-03-04T18:32:01.759725Z  INFO fq::validators: disabled validators: []

2024-03-04T18:32:01.759731Z  INFO fq::validators: enabled single read validators: ["[S003] NameValidator", "[S004] CompleteValidator", "[S002] AlphabetValidator", "[S001] PlusLineValidator", "[S005] ConsistentSeqQualValidator", "[S006] QualityStringValidator"]

2024-03-04T18:32:01.759738Z  INFO fq::validators: enabled paired read validators: ["[P001] NamesValidator"]

2024-03-04T18:32:01.759751Z  INFO fq::commands::lint: enabled special validators: ["[S007] DuplicateNameValidator"]

2024-03-04T18:32:01.759767Z  INFO fq::commands::lint: starting validation (pass 1)

2024-03-04T18:32:34.244239Z  INFO fq::commands::lint: read 16394279 * 2 records

2024-03-04T18:32:34.244285Z  INFO fq::commands::lint: starting validation (pass 2)

2024-03-04T18:32:44.033220Z  INFO fq::commands::lint: read 16394279 records

2024-03-04T18:32:44.035599Z  INFO fq::commands::lint: fq-lint end

ADD REPLY • link updated 16 months ago by GenoMax 152k • written 16 months ago by jway • 0