Pair-end reads are merged into single file: need to separate
0
0
Entering edit mode
9.6 years ago
Chirag Nepal ★ 2.4k

Hi there,

I downloaded publicly available datasets. which is 100bp long pair-end reads (with 200 nt of insert size). I downloaded fastq to map to the genome, but it seems authors have merged the reads. I blat few examples and only 100 reads map in stretch.

Fastq example

@SRR893106.1 1 length=202
CATAGGGTGCTCCGGCTCCAGCGTCTCGCAATGCTATCGCGTGCACACCCCCCAGACGAAAATACCAAATGCATGGAGAGCTCCCGTGAGTGGTTAATAGGGGGAGCCTATCATATATCTCCCTACCAACAAACCTACCCACCCTTAACAGCACATAGTACATAAAGCCATTTACCGTACATAGCACATTACAGTCAAATCC
+SRR893106.1 1 length=202
@@@FF?B:CFHHHIJGIIIIIEIHGGIIIJGGGGGGDABBB8;;CDAEHHFFD:?9?BBBDB>ACA:CD:>CCDDBC<(8?>:@?B8>@?:A@ABC3>3@>?<**ACACCAC;::>:;>5
@SRR893106.2 2 length=202
AGACAGATACTGCGACATAGGGTGCTCCGGCTCCAGCGTCTCGCAATGCTATCGCGTGCACACCCCCCAGACGAAAATACCAAATGCATGGAGAGCTCCCGGGGGTAGCTAAAGTGAACTGTATCCGACATCTGGTTCCTACTTCAGGGCCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAATAAGACATCACGATGGA
+SRR893106.2 2 length=202
CCCFFFFFGHHHHJJJJIJJJIAHHEIIIJJIJJJJGIDDEGAA@GGGIIHHHEFF**BBDCCCDDCDCDCDEDDCBCDACDDDD@@CFBDDFHHGHHCGHHIJJJJJJJJIJIJJJJJJIIIIJJJJJJJJJIJJIJJJIJJIJJJJJJJJIHHHHF?@ECECEDCDDEDDDDDCCCDDDBD?@?

I checked sequence using FASTQC which suggest authors have merged reads.

https://www.dropbox.com/s/clpmi7cktr2w7l1/Screen%20Shot%202014-09-18%20at%2017.35.31.png?dl=0

Is there any existing tools or suggest how to separate it.

Thanks in advance !

Cheers

Paired-end-reads • 6.0k views
ADD COMMENT
2
Entering edit mode

Did you download the sra file and then forget to use the --split-3 option?

ADD REPLY
1
Entering edit mode

Looks like. Fastq files are available for each end separately at the ENA: http://www.ebi.ac.uk/ena/data/view/SRR893106

ADD REPLY
0
Entering edit mode

Thanks matted !

ENA has the correct file format, which can be found here.

ADD REPLY
0
Entering edit mode

This is what i used to download SRA:

wget http://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP025/SRP025150/SRR893106/SRR893106.sra
wget http://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP025/SRP025150/SRR894447/SRR894447.sra

And default parameter of fastq-dump

for name in $(ls *sra)
do
    echo $name
    ~/unixTools/sratoolkit.2.3.5-2-ubuntu64/bin/fastq-dump $name
done

--split-3 is option on which tool? fastq-dump?

ADD REPLY
2
Entering edit mode
fastq-dump --split-3 $name

If you don't do that you'll get merged reads like this. Anyway, as matted said, it's usually easier to see if ENA has the fastq files first.

ADD REPLY

Login before adding your answer.

Traffic: 3222 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6