Question: Non-standard TCGA FASTQ files
2
gravatar for Daniel Gerlach
4.1 years ago by
Austria
Daniel Gerlach30 wrote:

I am looking at RNA-seq data from TCGA and noticed that some nucleotides are replaced by '.' in the FASTQ files. So far, I only encountered this coding in FASTA files that were repeat-masked. Can this encoding potentially cause any mappers or other down-stream software to fail on such files, or should I even replace the '.' by N?

Please see the second read in this output:

zcat TCGA/RNAseq/OV/data/246cf927-a407-4c5a-9f60-0822bf34b208/D0W8YACXX_7_TGACCA_R1.fastq.gz | head -8
@HS2_288:7:1101:1487:2168/1 1:N:0:TGACCA
GGTCAGTCTGCTTTCCCCCTGTTTTATAATGTTGGTGGTTTTAATCCGTATTTCTTTGCAACTTCTGTCTGGGCA
+
BC?DFFFFHHHHHJJJJJJJJJJJJEIEIHIFHGIGHIFHIJIIJIGHHGHJJHIJJJJIJJJJJJJJJJIGHHH
@HS2_288:7:1101:1723:2143/1 1:N:0:TGACCA
.CTGGTTATGTGCTCCCTTCCACAGGGCTACATGACGGCACTTTATTTTAAATCCTTTAAACAAAATACATATGG
+
#1=DFDFFHGFHHJJJJJJJJJJJJJJJJEHHJID9DHGIJJJJIJJJJIJJJJJJJIIEIIJJGHHHHHGFFFF

Best, Daniel

rna-seq tcga fastq • 1.2k views
ADD COMMENTlink modified 3.9 years ago by Biostar ♦♦ 20 • written 4.1 years ago by Daniel Gerlach30
1
gravatar for EagleEye
3.9 years ago by
EagleEye6.6k
Sweden
EagleEye6.6k wrote:

Those are N's

What Do The Period . Symbols Mean In The Sequence Record Of A Fastq File

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by EagleEye6.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1201 users visited in the last hour