I am looking at RNA-seq data from TCGA and noticed that some nucleotides are replaced by '.' in the FASTQ files. So far, I only encountered this coding in FASTA files that were repeat-masked. Can this encoding potentially cause any mappers or other down-stream software to fail on such files, or should I even replace the '.' by N?
Please see the second read in this output:
zcat TCGA/RNAseq/OV/data/246cf927-a407-4c5a-9f60-0822bf34b208/D0W8YACXX_7_TGACCA_R1.fastq.gz | head -8 @HS2_288:7:1101:1487:2168/1 1:N:0:TGACCA GGTCAGTCTGCTTTCCCCCTGTTTTATAATGTTGGTGGTTTTAATCCGTATTTCTTTGCAACTTCTGTCTGGGCA + BC?DFFFFHHHHHJJJJJJJJJJJJEIEIHIFHGIGHIFHIJIIJIGHHGHJJHIJJJJIJJJJJJJJJJIGHHH @HS2_288:7:1101:1723:2143/1 1:N:0:TGACCA .CTGGTTATGTGCTCCCTTCCACAGGGCTACATGACGGCACTTTATTTTAAATCCTTTAAACAAAATACATATGG + #1=DFDFFHGFHHJJJJJJJJJJJJJJJJEHHJID9DHGIJJJJIJJJJIJJJJJJJIIEIIJJGHHHHHGFFFF