How to use BBMap's reformat.sh to correctly interleaved reads?
2
0
Entering edit mode
5.1 years ago
O.rka ▴ 710

It says my file is empty when I do this:

(jespinoz_env) -bash-4.1$ reformat.sh in=mapped_interleaved.fastq out=mapped_interleaved_validated.fastq vpair
java -ea -Xmx200m -cp /usr/local/devel/ANNOTATION/jespinoz/anaconda/envs/mage_env/opt/bbmap-38.22-1/current/ jgi.ReformatReads in=mapped_interleaved.fastq out=mapped_interleaved_validated.fastq vpair
Executing jgi.ReformatReads [in=mapped_interleaved.fastq, out=mapped_interleaved_validated.fastq, vpair]

Input is being processed as paired
Writing interleaved.
Names do not appear to be correctly paired.
NS500647:155:H2MFYBGX2:1:11101:17655:15675:N:0:CACGCAAT#0/1
NS500647:155:H2MFYBGX2:1:11101:7822:15982:N:0:CACGCAAT#0/2

(jespinoz_env) -bash-4.1$ ls
mapped_interleaved.fastq  mapped_interleaved.stats.txt  mapped_interleaved_validated.fastq
(jespinoz_env) -bash-4.1$ ls -lhtr
total 12M
-rw-r--r-- 1 jespinoz tigr  10M Mar 27 01:25 mapped_interleaved.fastq
-rw-r--r-- 1 jespinoz tigr 2.1K Mar 27 01:25 mapped_interleaved.stats.txt
-rw-r--r-- 1 jespinoz tigr    0 Mar 27 14:14 mapped_interleaved_validated.fastq

There's definitely paired reads in here when I do it in python:

In [1]: import pandas as pd; from collections import defaultdict

In [2]: data = defaultdict(int)
   ...: with open("./mapped_interleaved.fastq", "r") as f:
   ...:     for line in f.readlines():
   ...:         if line.startswith("@"):
   ...:             line = line.strip()
   ...:             id_read = line[:-2]
   ...:             data[id_read] += 1
   ...: pd.Series(data).value_counts().sort_values()
   ...:
   ...:
   ...:
Out[2]:
2     3571
1    22170
dtype: int64

Here's an example of my reads:

   ....: head mapped_interleaved.fastq
   ....:
@NS500647:155:H2MFYBGX2:1:11105:7084:14986:N:0:CGCAACTA#0/1
ATTTTCTCCAAGTCTGTATGCTCATCTTCGATGGTTAAAGTAGCATGGCGCATGTTAGCATCTGTTAAGGCATCCATAAAACCACTTGCCCGCTCAATGCGAGTACTCAAACGACTCGTATCCGCTGTAATCAAGAGGAAATGCTCGTAAC
+
AAAAAEEEEEEEEEEEEEEEEEEEEAEEEAEEEAEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEAAE/EEEEEEE/EEEEE<EEEAEAEEEEAE6EEEEEAEEEEEEEAEEE6EEEEEE<E/EEEEEEEEEAEEEEEEEE<6<EEE
@NS500647:155:H2MFYBGX2:1:11105:12921:14384:N:0:CGCAACTA#0/2
ACTAGGAGCAGCCCCCGTCAAATCTCCAACGCCCACAGCAGATAGGGACCAAACTGTCTCACGACGTTTTAAACCCAGCTCACGTACCTCTTTAAATGGCGAACAGCCATACCCTTGGGACCGGCTACAGCCCCAGGATGAGATGAGCCG
+
AAAAAEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEA6AEEEEEEEEEEEEEEEEEEEEEEEAEEE/EAEAEEEA/EAEE<AEE/AEEEA<EAEA<AA<E<<AEAA
@NS500647:155:H2MFYBGX2:1:11104:26079:14394:N:0:CGCAACTA#0/1
GGTCATGAGGGGGACTCGTGTGATAAGGCAGCCTGAAATGGGATTGAGTGTTTATTTCAGGCTGCCTTGGGGGGTGTGAAGTGGGCGTGGTCATTGGATGAAGGCAGCCTGCGTAGCGAAGC
genome mapping next-gen fastq • 3.3k views
ADD COMMENT
2
Entering edit mode
5.1 years ago
h.mon 35k

An interleaved fastq file assumes all reads are paired, that is, reads are organized such as the first in the interleaved file read is the first read from R1, then the second read from the interleaved file is the first read from R2, the third read from the interleaved file is the second read from R1, and so on...

Both the reeformat.sh error and the output of head mapped_interleaved.fastq show your fastq is not properly interleaved.

You can use repair.sh from BBTools/BBMap to fix the interleaved fastq files.

ADD COMMENT
1
Entering edit mode
5.1 years ago
O.rka ▴ 710

Use repair.sh instead

ADD COMMENT

Login before adding your answer.

Traffic: 1850 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6