Hello everyone, So I was going to run an analysis on a seq data available on ncbi for various developmental stages of an organism. I decided to download this data and for some reason, some of the data occurred in twos, for instance, Blastula stage data 1 blastula data two. I dont know if both files are the same thing even though they differ slightly in size e.g 3GB and 3.5GB respectively. Secondly, when I download the file, they dont come as SRA but rather as fastq, I assume fastq-dump takes sra and not fastq. is this normal and what command can I use to split the fastq into forward and reverse strand so I can run Bowtie2 on them? Below is the command Im using for the split btw and I hope that is the correct command. Thanks
fastq-dump --split-3 blastula2.fastq
That probably has nothing to do the sequence data. It may have to do with the actual experiment since those two things seem to refer to two stages of
blastula
.Where possible search EBA-ENA with accession ID's so you can download the fastq files directly without having to worry about SRA and fastq-dump..
But the files I downloaded are already in fastq. I just need to split it into forward reads and reverse reads file to use as input for bowtie 2. Any ideas on how I can achieve this?
Are you sure about that? If you have reads in interleaved format then
reformat.sh
from BBMap suite can be used to separate the R1 and R2 reads.So I ran this and only R2.fq was made. does that mean my read is not paired end? how do I know if the original fastq is or is not paired end please? And generating only R2.fq means it is not paired-end reads, could the second fastq file (named blastula1) on NCBI be the second read? it looks like this on NCBI SRA site.
This is indeed a single-end dataset. Confirmed by a single fastq available from ENA.
I am somehow an intermediate level in this field. So I assume the second file that say blastula_2 would be the second or reverse read and the blastula_1 would be the forward read?
Post the example of SRR # for
blastula_2
data so I can check.Looking at the project listing on ENA they all appear to be single-end datasets.
blastula_1
andblastula_2
could be biological replicates but I doubt they are two parts of paired-end reads.It is true that they might be replicates because some of the developmental stages have more than two files X_1, X_2, X_3 ... which means it could be replicate. If that is the case, can I use the single end read for bowtie2 mapping? below is the second file page:
This is the same sample page as you posted before.
My bad