Question: What Do The Period . Symbols Mean In The Sequence Record Of A Fastq File
1
gravatar for Wayne
6.6 years ago by
Wayne1.0k
United States
Wayne1.0k wrote:

Hello all, I have a funny issue with bam files where the 4th character in a lot of my reads shows up as a ".". I've never seen this before but its running havoc with my scripts. Does anyone know what causes this or what it means? Below is an example

example:

@DHT4KXP1:3:1101:2235:2028#0/1
GAA.TACTGCCAAGTCATCCGTGTCATTGCCCACACCCAGATGCGCCTGCTTCCTCTGCGCCAGAAGAAGGCCCACCTGATGGAGATCCAGGTGAACGGAG
+DHT4KXP1:3:1101:2235:2028#0/1
_a_BS\ccgggegihhgfiiighfghiihhhhhiiiihiihifhiiiiihihhihihhh[dgeeeebddd_aacc_acccccbcccccccccbbccccacc
bam mapping sequencing • 2.6k views
ADD COMMENTlink modified 6.6 years ago by Istvan Albert ♦♦ 81k • written 6.6 years ago by Wayne1.0k
1

You might want to get a geiger counter ;). Just to be sure, it is exclusive to the fourth position of the read? What is the provenance of the data?

ADD REPLYlink modified 6.6 years ago • written 6.6 years ago by Aaronquinlan11k
1

Some versions of the SOLiD sequencer used to put in dots into the colorspace sequence whenever the quality was too low and was unable to call a color. Used to break all kinds of tools.

ADD REPLYlink written 6.6 years ago by Istvan Albert ♦♦ 81k

I was thinking this as well, except the Q = 33 assuming Sanger scaling. Odd.

ADD REPLYlink written 6.6 years ago by Aaronquinlan11k

It is not just SOLiD data, I used to see this in Illumina qseq files a few years ago (when read lengths were at 75-76 bp). This was very frustrating because most tools would just die assuming it was improperly formatted data, especially with these dots at the beginning of the sequence. My assumption was that it was just bases that could not be called so I trimmed them.

ADD REPLYlink modified 6.6 years ago • written 6.6 years ago by SES8.2k
5
gravatar for swbarnes2
6.6 years ago by
swbarnes26.1k
United States
swbarnes26.1k wrote:

The dot, and the B as the quality score (Your qualities range fro 'B' to 'h', which is the older encoding scheme, where B is the worst quality )indicate that it's an unknown base. Use sed to change all the '.' to 'N'.

ADD COMMENTlink written 6.6 years ago by swbarnes26.1k
3
gravatar for JC
6.6 years ago by
JC8.2k
Mexico
JC8.2k wrote:

My guess (as Istvan suggested before) is an incorrect calling but instead of N it put a . like in SOLiD pipeline, also Q = B is the lower value in Illumina 1.5+ calling pipeline. Do you know the technology/source?

ADD COMMENTlink written 6.6 years ago by JC8.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1485 users visited in the last hour