Question: What Do The Period . Symbols Mean In The Sequence Record Of A Fastq File
1
gravatar for Wayne
5.6 years ago by
Wayne980
United States
Wayne980 wrote:

Hello all, I have a funny issue with bam files where the 4th character in a lot of my reads shows up as a ".". I've never seen this before but its running havoc with my scripts. Does anyone know what causes this or what it means? Below is an example

example:

@DHT4KXP1:3:1101:2235:2028#0/1
GAA.TACTGCCAAGTCATCCGTGTCATTGCCCACACCCAGATGCGCCTGCTTCCTCTGCGCCAGAAGAAGGCCCACCTGATGGAGATCCAGGTGAACGGAG
+DHT4KXP1:3:1101:2235:2028#0/1
_a_BS\ccgggegihhgfiiighfghiihhhhhiiiihiihifhiiiiihihhihihhh[dgeeeebddd_aacc_acccccbcccccccccbbccccacc
bam mapping sequencing • 2.3k views
ADD COMMENTlink modified 5.6 years ago by Istvan Albert ♦♦ 77k • written 5.6 years ago by Wayne980
1

You might want to get a geiger counter ;). Just to be sure, it is exclusive to the fourth position of the read? What is the provenance of the data?

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by Aaronquinlan10k
1

Some versions of the SOLiD sequencer used to put in dots into the colorspace sequence whenever the quality was too low and was unable to call a color. Used to break all kinds of tools.

ADD REPLYlink written 5.6 years ago by Istvan Albert ♦♦ 77k

I was thinking this as well, except the Q = 33 assuming Sanger scaling. Odd.

ADD REPLYlink written 5.6 years ago by Aaronquinlan10k

It is not just SOLiD data, I used to see this in Illumina qseq files a few years ago (when read lengths were at 75-76 bp). This was very frustrating because most tools would just die assuming it was improperly formatted data, especially with these dots at the beginning of the sequence. My assumption was that it was just bases that could not be called so I trimmed them.

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by SES8.1k
5
gravatar for swbarnes2
5.6 years ago by
swbarnes24.0k
United States
swbarnes24.0k wrote:

The dot, and the B as the quality score (Your qualities range fro 'B' to 'h', which is the older encoding scheme, where B is the worst quality )indicate that it's an unknown base. Use sed to change all the '.' to 'N'.

ADD COMMENTlink written 5.6 years ago by swbarnes24.0k
3
gravatar for JC
5.6 years ago by
JC6.8k
Mexico
JC6.8k wrote:

My guess (as Istvan suggested before) is an incorrect calling but instead of N it put a . like in SOLiD pipeline, also Q = B is the lower value in Illumina 1.5+ calling pipeline. Do you know the technology/source?

ADD COMMENTlink written 5.6 years ago by JC6.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 733 users visited in the last hour