Entering edit mode
7.5 years ago
Harry_Potter
•
0
I apologize if this question is obvious, I am new and just can't figure it out.
Is the publicly available TCGA data paired end or single end?
I'm a new student and just learning about FASTQ files - those would be huge for TCGA downloads, I am guessing all the data is already translated. Is that correct?
In general if you see files with just
*R1*
in the name then that would be single-end sequence data. Paired-end data would be present in two files per sample (Sample_R1.fq.gz and Sample_R2.fq.gz
). These represent sequence from two ends of a unique (same) fragment.All raw high-thoughput sequencing data is generally provided as fastq format files. This is always DNA sequence in
5'-->3'
orientation. There is no "translation" available (into protein sequence, if that is what you mean).