Question: I have questions about importing fastq data from ENA to qiime2
0
gravatar for seok1213neo
25 days ago by
seok1213neo0 wrote:

Hi I am new to using Qiime2, and I got couple of problems for importing my data to Qiime2

I understand that you first need to create an artifact file designated for the program (qza file), and there are ways to do so in regards to the format of the fastq file of interest.

my questions are:

  1. I got my fastq files from ENA, and the format of them seems like they are not explained in the Qiime tutorial. the fastq files are consisted of 2 files, which I thought were forward and reverse files from paired-end-sequencing one of the files looked like the below

eg)

@SRR6664786.1 1/2
CTACTACAGGTTTTTCTTTTCCTCTTCTCTCCCCCTCCTTTCTCTCCTCCTCCTCTTTCTCTTCCCCCCCTTCCCCCTTCTCCTCCCCTTTTCCTCCCCTTCTCTTCTCCTTCCCCCTCTTCACCCCTTTTTCCTCCCTCCCCTCCCTTCCCCCCCCCCCCTCCTTTCCTTCCCCTTCTCTCCTTTTTTCCCCCCCCTTCCTCCCCCTTCCCTCTCTCCCCCCTCCTCTCCCTTTCTCCCCCCTCCCTCCCCCTCCCCCCTCCCCCCTTCCTTTTCCCCTCCTCTTTTTCCTTCCCTTCTC
+
#############################################################################################################################################################################################################################################################################################################
@SRR6664786.2 2/2
TGGGACTTCTGGTGTTTCTTATCCTTTTTTCTCCCCACGCTTTCGCTCCTTTGCGTCTGTTCTTTCCCCATGCCCTGCCTTCCCCTTCTTTTTTCCTCCCCATCTCTACTCTTTTCCCCTCTACACGTGGTTTTCTACCCCTCCCTATAGTCCTCTTGCGTCCCCGTTTGTTTTTCATTTCCCTGTTTTCTCCCGCGTCTTTCCCCCCTCTCTTTCTTCTCCCCCTGCCTGCCCTTTTCCCCCCTTTTCTCCCCTTCCTCCTCTCCCCCCCCCTTTTCCCCCCCTTCTTTGCCCCTCTTTT
+
#############################################################################################################################################################################################################################################################################################################
@SRR6664786.3 3/2
TCGTCTACACGCTTTTCTTTTTCTTTTTTTTTCCCCCCCTTTCTTTCTTCTCCCTCCGTTTCTTTCCTTTCACCCCCCTTCCCTCCTCCCTTTCCTTCTTCTTTCTATCTATTTCTTTCCTCCCCTCCCACTTCTTCTTCCCCCCCCCTCCCTTCCTTACGCCTCTCTCCTTTCCCTTCCCCCTCTTTTTTCCATCCCTTTTTCTCCCTCCTTCCTTCCCCCTCCACTCTCCCTTCCCTCCCCCTTCTCCCCTCTACCTCCTCCCCCCCCCTCTTTCCCTCCCCCTCCTCCATCTCGTTAT
+
#############################################################################################################################################################################################################################################################################################################
@SRR6664786.4 4/2
CGGACTACCATGGTTTCTAATCCTTTTTTTTACCCACACTTTCGATCTTCTCTGTCAGTTGCTTTCCAGTGAGCTGCCTTCTCTATCGGTTTTCTTCCTTTTATCTAAGCATTTCTCCTCTACACCACGAATTCCCCCCACCTCTACTGTCCTCAATACTGACATTATCATCTGCAATTTTACGGTTTTTCCGCAAACTTTCACACCTTACTTCCCTTTCCACCTACGCTCCCTTTAAACCCAATCACTCCGTCTAACCCTCGGATCCTCCGTATTCCCCCGGCTTCTGCCTCTGATTTCT
+
-88ACEDGGFFA9FGGAC6,CCEF,<CC++@,CCFC;7BEFFE,@,6,<6,,<,C@<,CE,,,,<9:@,C,,,:E,@BFF=,:,,996+,,BCFF??,9,,:,:9A??,?,AE?;,94,49944A,9A7++9AA?,9+4+46@?F############################################################################################################################################################

I first thought they are fastq files with barcodes in the sequence, so I managed to make an artifact (multiplexed.qza) out of them, then when I tried to demultiplex them I needed a metadata (typically in tsv format), which I needed to know the barcodes, but where can I find the barcodes in such files? could you help me what sequences are the barcodes?

  1. If I am wrong about interpreting them as 'multiplexed sequence with barcodes in sequence', what type of fastq file should they be? if barcodes are not in the sequence, where should i find their barcodes gz files?

Looking forward to seeing your answers! Thank you

qiime fastq • 113 views
ADD COMMENTlink modified 24 days ago • written 25 days ago by seok1213neo0
0
gravatar for h.mon
25 days ago by
h.mon31k
Brazil
h.mon31k wrote:

See What is the relationship between BioSamples, SRA Experiments, SRA Runs, and my data files?.

Sequencing data deposited at ENA / SRA / DDBJ is already demultiplexed, each file (or pair of paired reads files) corresponds to one sample. You have to search the BioProject and BioSample pages for the metadata describing the files pertaining to the experiment you are interested at.

ADD COMMENTlink written 25 days ago by h.mon31k

and those files are not in Casava1.8 format right?, and if each file contains two fastq files, do they mean they are forward and reverse sequences formed by paired-end-sequencing? but i got files that only have one fastq files, and other data even had three fastq files. i am so confused. please help me

ADD REPLYlink modified 25 days ago • written 25 days ago by seok1213neo0

and those files are not in Casava1.8 format right?

No, these files are usually compressed fastq files, or sra files, which is a format created by the NCBI. Regarding the number of fastq files, again, you will have to read the available metadata to figure out what is happening. Possibilities include incorrect submissions, barcode files, single-end sequencing, and so on.

ADD REPLYlink modified 25 days ago • written 25 days ago by h.mon31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1338 users visited in the last hour