What does the name of a read mean? How to find its location in genome viewer?
Entering edit mode
9 days ago
sacryt • 0

I have a list of read names, and I want to find where in the genome these reads lie, but I don't know how to do this. I am using UK biobank sequencing data and looking on the integrated genome viewer.

The read names look like this:


But I have no idea what this means, so I don't know where in the genome they are?

On the IGV I have tried searching for parts of it but I don't think it finds the right place? E.g. for the first read I searched for just "4:2573" but this gave me a location on chr4 with no base calls

sorry for probably a very naive question but no idea where to find out more about this, i don't know if this is a UK biobank specific naming convention, or a more universal standard etc.?

ukb reads • 222 views
Entering edit mode
9 days ago

see fastq format in wikipedia "Illumina sequence identifiers" : https://en.wikipedia.org/wiki/FASTQ_format

A00715  the unique instrument name
256     the run id
HWYM5DSXY   the flowcell id
4   flowcell lane
2573    tile number within the flowcell lane
10131   'x'-coordinate of the cluster within the tile
31751   'y'-coordinate of the cluster within the tile 
Entering edit mode
9 days ago
dsull ★ 6.2k

Those are names produced by the sequencing machine that don't tell you anything meaningful.

The actual A's, T's, C's, G's in your sequencing files are all you need to look at. To figure out where a sequence exists along a chromosome, you have to map those nucleotides to a reference genome using a genome aligner tool such as bwa.

Basically, you have to run a tool on your sequencing files to produce an alignment file (such as a BAM file) -- that's what you load into IGV.


Login before adding your answer.

Traffic: 2062 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6