Question: Non-standard TCGA FASTQ files
gravatar for Daniel Gerlach
4.1 years ago by
Daniel Gerlach30 wrote:

I am looking at RNA-seq data from TCGA and noticed that some nucleotides are replaced by '.' in the FASTQ files. So far, I only encountered this coding in FASTA files that were repeat-masked. Can this encoding potentially cause any mappers or other down-stream software to fail on such files, or should I even replace the '.' by N?

Please see the second read in this output:

zcat TCGA/RNAseq/OV/data/246cf927-a407-4c5a-9f60-0822bf34b208/D0W8YACXX_7_TGACCA_R1.fastq.gz | head -8
@HS2_288:7:1101:1487:2168/1 1:N:0:TGACCA
@HS2_288:7:1101:1723:2143/1 1:N:0:TGACCA

Best, Daniel

rna-seq tcga fastq • 1.2k views
ADD COMMENTlink modified 3.9 years ago by Biostar ♦♦ 20 • written 4.1 years ago by Daniel Gerlach30
gravatar for EagleEye
3.9 years ago by
EagleEye6.6k wrote:

Those are N's

What Do The Period . Symbols Mean In The Sequence Record Of A Fastq File

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by EagleEye6.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1201 users visited in the last hour