Entering edit mode
6.3 years ago
galud1990
•
0
Hello all,
I have a bunch of data that I have downloaded from EBI-ENA, however I only get a big fastaq file for each sample and I think Forward and Reverse are Combined together. I would like to get a forward and reverse reading fasta q file. I've read that when geeting data from ena or ncbi the fastaq R1 and R2 are together in one file. if anyone has suggestions into how to get fastaq R1 and R2 I would appreciated it. I am going crazy trying to figure out how to get these sequences.
This is the code I used to get my data:
#!/bin/bash
# download_ebi_fastq.sh
#
for study_accession in ERP021896 ERP020023 ERP006348
do
count=-1
curl -s "https://www.ebi.ac.uk/ena/data/warehouse/filereport?accession=${study_accession}&result=read_run&fields=secondary_sample_accession,submitted_ftp" | grep -v "^secondary_sample_accession" > ${study_accession}.details.txt
for fq in `awk '{print $1, $2}' ${study_accession}.details.txt`
do
((count++))
if [[ $(( count % 2)) -eq 0 ]]
then
id=$fq
current_path=${study_accession}/${id}
current_base=${current_path}/${id}
if [ -d "${current_path}" ]; then
continue
fi
echo "Fetching ${id}..."
mkdir -p ${current_path}
curl -s "http://www.ebi.ac.uk/ena/data/view/${id}&display=xml" > ${current_base}.xml &
else
if [ -e "${current_base}.fna" ]; then
continue
fi
# sed from http://stackoverflow.com/a/10359425/19741
curl -s $fq | zcat > ${current_base}.fq &
fi
if [[ $((count % 10)) -eq 0 ]]
then
wait
fi
done
done
wait
Off topic, but the format is a "fastq" file. Not fastaq, that does not exist. The majority of people will understand what you mean, but correct terminology facilitates communication.
yes, got it. Sorry about that!
It appears that those are single end runs. So you are going to get only one read. SRA example and ENA examples.
Yes I realized that after I post this. I toolay missed that. Thanks for your response!