SRA obtained metagenomic reads appears to corrupt - cannot work on SingleM
1
0
Entering edit mode
7 weeks ago
rfour92 • 0

Hello,

I am trying to run SingleM on data obtained using sratoolkit (2.10.7)

I used prefetch SRR2103020 to download and fastq-dump --outdir ./fastq --split-e ./SRR2103020/SRR2103020.sra to split.

when I check the file : head SRR2103020_1.fastq

@SRR2103020.1 D7RS0RN1:177:C12Y0ACXX:3:1101:1404:2079 length=93
AATGTGGACAGCGCCGTCTTCAAACAGGCGCTGTCCAGCTAGCAGCTCAACGCTCCGCGCCGCCGTCTTCGCCGTCTTCAGGCAGGGGGAGAA
+SRR2103020.1 D7RS0RN1:177:C12Y0ACXX:3:1101:1404:2079 length=93
@BCFFFDDHHHHHJJJGIJIBHHIJJCH?HBG<GIG@HEGIGFIFG@>CGHHHFFFDD@BB::BD@B??CDBDD@BD@DA@CCDDBD######
@SRR2103020.2 D7RS0RN1:177:C12Y0ACXX:3:1101:1440:2113 length=93
GGTATAAGTTCTATGTGTAATGAACCACAGAGTTATCAAAAAACTCAAGATCTGTCTCTTATACACATCTGACGCTGCCGACGAGCGATCTAG
+SRR2103020.2 D7RS0RN1:177:C12Y0ACXX:3:1101:1440:2113 length=93
@C?DDFFFHHHHHIIGIIIIJ<IHHIJJJIHIFFEIIIIJJIJIJJIIJIGHIJJJJIJJJGJJIJDHIJJJJJHHHFFFDDBD#########
@SRR2103020.3 D7RS0RN1:177:C12Y0ACXX:3:1101:1386:2229 length=93
GAATGAATCAAGGATGCTAAGTCTCCATCTACAAAATTATTTGTTTGAACAGATAAGTTTAACCGACTTTAAAGTCTATTCAGTTATCTACAC

I used a fastq repair tool (fastqwiper) using: fastqwiper --fastq_in SRR2103020_1.fastq --fastq_out SRR_test_wiped_1.fastq

and the file appears to be repaired:

@SRR2103020.20 D7RS0RN1:177:C12Y0ACXX:3:1101:3313:2100 length=93
GCTCAATTCCCACACTTGAACACTTTCAATACATTCATCCCAAATAGGTTTTGCTATCGGATTATTACCTGAAGCCAATTCTTTCAAACCTTC
+
CCCFFFFFGHHHHJJJJIIJIIIJJJGIJFIJEIJJJJJJJIJJGIIIBFGHHJEIIJJJGIJIIJIJJJJIHGHHEFBEFFFEDDECCDC<>
@SRR2103020.47 D7RS0RN1:177:C12Y0ACXX:3:1101:5867:2138 length=93
CCTTAATTCAAACTCAGTTCTACGGACAACAACTTCATGCCTGAAATCCACAAAATGAGTTAAAACATCTTTCAGGGGCATAATCTTTGGAAC
+
CCCFFFFFHHHHHJJIJHHJIHJJJJJJJJIIJJJJJIIJIIJGIJIJIIFHIDGFHIJHIIJJJJJJJJJJHHHHEFFDCEEEDCDDCDDCA
@SRR2103020.52 D7RS0RN1:177:C12Y0ACXX:3:1101:6059:2078 length=93
ATGCTCCTCCAACCATTACATCTGTTGAATTTGCAACTTGTACAATTAAACTTCCAGTTTTCGTAATTGAATTGAAAATTTCAAAAGTTGCAC

however, when I run singleM using: singlem pipe --forward ./SRR_test_wiped_1.fastq --otu_table singlem/sampe01_F_otu --threads 20

the job fails and I get error message:

/lib/python3.6/site-packages/singlem/data/S1.6.ribosomal_protein_L14b_L23e_rplN.gpkg.spkg/S1.6.ribosomal_protein_L14b_L23e_rplN/graftmAiWqW8_search.hmm -) | hmmsearch --domE 1e-05 --cpu 1 -o /dev/null --noali --domtblout /dev/shm/tmp43rja84p/graftm_protein_search/SRR_test_wiped_1_b/graftmIkMbHN_search_SRR_test_wiped_1_b.hmmout.txt /ibex/scratch/alamourt/conda_singlem_env/lib/python3.6/site-packages/singlem/data/S1.6.ribosomal_protein_L14b_L23e_rplN.gpkg.spkg/S1.6.ribosomal_protein_L14b_L23e_rplN/graftmIkMbHN_search.hmm - returned non-zero exit status 1.\nSTDERR was: b\'\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\'STDOUT was: b\'\'\n'STDOUT was: b''

can you please assist, is there anything wrong in my workflow? or is there a better tool to repair fastq files? please keep in mind that I ran the same SingleM package on another metagenome and it works properly. the job did not finish yet but it has been running for 45 minutes while using this dataset, the job fails after one minute. so I am assuming its SRA related issue. thank you

SingleM SRAtoolkit • 247 views
ADD COMMENT
0
Entering edit mode
7 weeks ago
Mensur Dlakic ★ 20k

Not sure why: 1) you are splitting reads; 2) you are fixing reads when they seem fine. I suggest something like this instead after prefetch:

fastq-dump --outdir ./ ./SRR2103020/SRR2103020.sra
singlem pipe --sequences SRR2103020.fastq --otu_table singlem/sampe01_F_otu --threads 20
ADD COMMENT

Login before adding your answer.

Traffic: 770 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6