Roche 454: How to know if reads are paired ?
3
1
Entering edit mode
7.4 years ago
pastatonio78 ▴ 30

Hi,

Imagine a Fastq file generated from a Roche 454 platform. You have no information whatsoever about the protocol that what used. The header of the reads give no specific information, just random alphanumeric characters. Each read starts with a 30 bp sequence and ends with a 15bp sequence that look to me like an adapter (?).

How can I be sure that reads are single-ends or paired-ends? Is there anyway to know that just on the basis of sequence information?

Thanks ;)

454 paired-end paired end fastq hts • 6.9k views
3
Entering edit mode
7.4 years ago
rtliu ★ 2.1k

For 454 flx:

grep 'GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC' 454Reads.fastq | wc -l


You should see a big number for 454 'paired-end' data, or 0 for single end data.

1. -linker flx -- GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC, a palindrome, equal to its own reverse complement.
2. -linker titanium -- TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG and the reverse-complement CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA.
0
Entering edit mode

Thanks. Sorry for the late answer.... 16 months ago LOL

2
Entering edit mode
7.4 years ago
kmcarr00 ▴ 280

In 454 technology there is no such thing as paired reads, at least in the sense that we all understand paired end sequencing. Given the design of their bead based sequencing it is impossible to generate reads from both ends of a template fragment. All 454 reads single reads, possibly with a barcode at the start of the read.

Roche had a protocol which they called "paired end" but that was misappropriation of the term. It was a protocol used for amplicon sequencing which mixed capture beads with the A and B oligos to randomize which end of an amplicon molecule would get sequenced. You still only got one read from each fragment.

1
Entering edit mode

What Roche/454 calls 'paired end' is sequencing both ends of longer fragments by circularisation, linker ligation, fragmentation and sequencing the fragments containing the linker. We would now call that a variant of mate pair sequencing.

0
Entering edit mode

You are right Lex. It has been so long since I've dealt with 454 data I forgot about that format.

0
Entering edit mode
7.4 years ago
Prakki Rama ★ 2.5k

I would check like this:

LC_ALL=C fgrep 'ID' 454Reads.fastq | cut -d " " -f 1 | sort | uniq -d

If the count is 2 for many reads, then it must be paired read file.

*ID in the above command should be the common string you see in all the reads.