Entering edit mode
8.7 years ago
Chirag Nepal ★ 2.3k
I downloaded publicly available datasets. which is 100bp long pair-end reads (with 200 nt of insert size). I downloaded fastq to map to the genome, but it seems authors have merged the reads. I blat few examples and only 100 reads map in stretch.
@SRR893106.1 1 length=202 CATAGGGTGCTCCGGCTCCAGCGTCTCGCAATGCTATCGCGTGCACACCCCCCAGACGAAAATACCAAATGCATGGAGAGCTCCCGTGAGTGGTTAATAGGGGGAGCCTATCATATATCTCCCTACCAACAAACCTACCCACCCTTAACAGCACATAGTACATAAAGCCATTTACCGTACATAGCACATTACAGTCAAATCC +SRR893106.1 1 length=202 @@@FF?B:CFHHHIJGIIIIIEIHGGIIIJGGGGGGDABBB8;;CDAEHHFFD:?9?BBBDB>ACA:CD:>CCDDBC<(8?>:@?B8>@?:A@ABC3>3@>?<**ACACCAC;::>:;>5 @SRR893106.2 2 length=202 AGACAGATACTGCGACATAGGGTGCTCCGGCTCCAGCGTCTCGCAATGCTATCGCGTGCACACCCCCCAGACGAAAATACCAAATGCATGGAGAGCTCCCGGGGGTAGCTAAAGTGAACTGTATCCGACATCTGGTTCCTACTTCAGGGCCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAATAAGACATCACGATGGA +SRR893106.2 2 length=202 CCCFFFFFGHHHHJJJJIJJJIAHHEIIIJJIJJJJGIDDEGAA@GGGIIHHHEFF**BBDCCCDDCDCDCDEDDCBCDACDDDD@@CFBDDFHHGHHCGHHIJJJJJJJJIJIJJJJJJIIIIJJJJJJJJJIJJIJJJIJJIJJJJJJJJIHHHHF?@ECECEDCDDEDDDDDCCCDDDBD?@?
I checked sequence using FASTQC which suggest authors have merged reads.
Is there any existing tools or suggest how to separate it.
Thanks in advance !
Did you download the sra file and then forget to use the
Looks like. Fastq files are available for each end separately at the ENA: http://www.ebi.ac.uk/ena/data/view/SRR893106
Thanks matted !
ENA has the correct file format, which can be found here.
This is what i used to download SRA:
And default parameter of fastq-dump
--split-3is option on which tool? fastq-dump?
If you don't do that you'll get merged reads like this. Anyway, as matted said, it's usually easier to see if ENA has the fastq files first.