SRA obtained metagenomic reads appears to corrupt - cannot work on SingleM
3
0
Entering edit mode
22 months ago
rfour92 • 0

Hello,

I am trying to run SingleM on data obtained using sratoolkit (2.10.7)

I used prefetch SRR2103020 to download and fastq-dump --outdir ./fastq --split-e ./SRR2103020/SRR2103020.sra to split.

when I check the file : head SRR2103020_1.fastq

@SRR2103020.1 D7RS0RN1:177:C12Y0ACXX:3:1101:1404:2079 length=93
AATGTGGACAGCGCCGTCTTCAAACAGGCGCTGTCCAGCTAGCAGCTCAACGCTCCGCGCCGCCGTCTTCGCCGTCTTCAGGCAGGGGGAGAA
+SRR2103020.1 D7RS0RN1:177:C12Y0ACXX:3:1101:1404:2079 length=93
@BCFFFDDHHHHHJJJGIJIBHHIJJCH?HBG<GIG@HEGIGFIFG@>CGHHHFFFDD@BB::BD@B??CDBDD@BD@DA@CCDDBD######
@SRR2103020.2 D7RS0RN1:177:C12Y0ACXX:3:1101:1440:2113 length=93
GGTATAAGTTCTATGTGTAATGAACCACAGAGTTATCAAAAAACTCAAGATCTGTCTCTTATACACATCTGACGCTGCCGACGAGCGATCTAG
+SRR2103020.2 D7RS0RN1:177:C12Y0ACXX:3:1101:1440:2113 length=93
@C?DDFFFHHHHHIIGIIIIJ<IHHIJJJIHIFFEIIIIJJIJIJJIIJIGHIJJJJIJJJGJJIJDHIJJJJJHHHFFFDDBD#########
@SRR2103020.3 D7RS0RN1:177:C12Y0ACXX:3:1101:1386:2229 length=93
GAATGAATCAAGGATGCTAAGTCTCCATCTACAAAATTATTTGTTTGAACAGATAAGTTTAACCGACTTTAAAGTCTATTCAGTTATCTACAC

I used a fastq repair tool (fastqwiper) using: fastqwiper --fastq_in SRR2103020_1.fastq --fastq_out SRR_test_wiped_1.fastq

and the file appears to be repaired:

@SRR2103020.20 D7RS0RN1:177:C12Y0ACXX:3:1101:3313:2100 length=93
GCTCAATTCCCACACTTGAACACTTTCAATACATTCATCCCAAATAGGTTTTGCTATCGGATTATTACCTGAAGCCAATTCTTTCAAACCTTC
+
CCCFFFFFGHHHHJJJJIIJIIIJJJGIJFIJEIJJJJJJJIJJGIIIBFGHHJEIIJJJGIJIIJIJJJJIHGHHEFBEFFFEDDECCDC<>
@SRR2103020.47 D7RS0RN1:177:C12Y0ACXX:3:1101:5867:2138 length=93
CCTTAATTCAAACTCAGTTCTACGGACAACAACTTCATGCCTGAAATCCACAAAATGAGTTAAAACATCTTTCAGGGGCATAATCTTTGGAAC
+
CCCFFFFFHHHHHJJIJHHJIHJJJJJJJJIIJJJJJIIJIIJGIJIJIIFHIDGFHIJHIIJJJJJJJJJJHHHHEFFDCEEEDCDDCDDCA
@SRR2103020.52 D7RS0RN1:177:C12Y0ACXX:3:1101:6059:2078 length=93
ATGCTCCTCCAACCATTACATCTGTTGAATTTGCAACTTGTACAATTAAACTTCCAGTTTTCGTAATTGAATTGAAAATTTCAAAAGTTGCAC

however, when I run singleM using: singlem pipe --forward ./SRR_test_wiped_1.fastq --otu_table singlem/sampe01_F_otu --threads 20

the job fails and I get error message:

/lib/python3.6/site-packages/singlem/data/S1.6.ribosomal_protein_L14b_L23e_rplN.gpkg.spkg/S1.6.ribosomal_protein_L14b_L23e_rplN/graftmAiWqW8_search.hmm -) | hmmsearch --domE 1e-05 --cpu 1 -o /dev/null --noali --domtblout /dev/shm/tmp43rja84p/graftm_protein_search/SRR_test_wiped_1_b/graftmIkMbHN_search_SRR_test_wiped_1_b.hmmout.txt /ibex/scratch/alamourt/conda_singlem_env/lib/python3.6/site-packages/singlem/data/S1.6.ribosomal_protein_L14b_L23e_rplN.gpkg.spkg/S1.6.ribosomal_protein_L14b_L23e_rplN/graftmIkMbHN_search.hmm - returned non-zero exit status 1.\nSTDERR was: b\'\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\\nError: Sequence file - is empty or misformatted\\n\\n\'STDOUT was: b\'\'\n'STDOUT was: b''

can you please assist, is there anything wrong in my workflow? or is there a better tool to repair fastq files? please keep in mind that I ran the same SingleM package on another metagenome and it works properly. the job did not finish yet but it has been running for 45 minutes while using this dataset, the job fails after one minute. so I am assuming its SRA related issue. thank you

SingleM SRAtoolkit • 946 views
ADD COMMENT
0
Entering edit mode
22 months ago
Mensur Dlakic ★ 27k

Not sure why: 1) you are splitting reads; 2) you are fixing reads when they seem fine. I suggest something like this instead after prefetch:

fastq-dump --outdir ./ ./SRR2103020/SRR2103020.sra
singlem pipe --sequences SRR2103020.fastq --otu_table singlem/sampe01_F_otu --threads 20
ADD COMMENT
0
Entering edit mode
8 months ago
Tommaso • 0

if you think the fastq file is corrupted, te easier way to fuse FastqWiper is through Docker:

docker pull mazzalab/fastqwiper

Once downloaded the image, you can type:

docker run --rm -ti --name fastqwiper -v
"YOUR_STATIC_PATH_TO_DATA_FOLDER:/fastqwiper/data" mazzalab/fastqwiper
paired 8 sample

where:

  1. YOUR_STATIC_PATH_TO_DATA_FOLDER is the path to the folder where you have the fastq.gz files to be wiped
  2. paired trigger the cleaning of R1 and R2 while single that of individual files
  3. 8 is the number of cores to be spawned;
  4. sample is part of the names of the files to be wiped. In this regard, remember that:

    • for paired-end files (e.g., "sample_R1.fastq.gz" and "sample_R2.fastq.gz"), your files must finish with "_R1.fastq.gz" and "_R2.fastq.gz". The text to pass is everything before this text, sample in this case.
    • for single end files (e.g., "excerpt_R1_001.fastq.gz"), your file must ends with the string ".fastq.gz"; all the preceding text, i.e., excerpt_R1_001 will be the text to be passed to the command above:

      docker run --rm -ti --name fastqwiper -v "YOUR_STATIC_PATH_TO_DATA_FOLDER:/fastqwiper/data" mazzalab/fastqwiper single 8 excerpt_R1_001

ADD COMMENT
0
Entering edit mode
8 months ago
predeus ★ 1.9k

Just get them from ENA - they come as fastq.gz which is convenient: https://www.ebi.ac.uk/ena/browser/view/SRR2103020

ADD COMMENT

Login before adding your answer.

Traffic: 2748 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6