Question: reference genome hs37d5 with PhiX
0
gravatar for SOHAIL
10 weeks ago by
SOHAIL220
Beijing Institute of Genomics, CAS.
SOHAIL220 wrote:

Hi everybody,

can anyone guide me from where to download the hs37d5.fa genome merged with phiX genome?

I have downloaded few BAM files containing the tag in header section @SQ SN:PhiX LN:6386 together with other hs37d5/fa contigs. I am looking for which version of the reference sequence was used for these files.

any help would be appreciated!

Regards

(Note: I am aware of phiX genome is used perhaps a control in illumina sequencing.)

reference genome ngs • 286 views
ADD COMMENTlink modified 4 days ago by kelso20 • written 10 weeks ago by SOHAIL220
1

perhaps I am wrong but there is no "the reference with PhiX". When I was at the Max Planck Inst. in Leipzig, we made our own by concatenating the human reference and some decoy sequences. I suggest you contact the center in question that produced the BAM file.

ADD REPLYlink written 10 weeks ago by Gabriel R.2.5k

@ Gabriel R,

yes! You are right. The BAM files are from Max Planck Inst. in Leipzig. can you share your ref, please...?

ADD REPLYlink written 10 weeks ago by SOHAIL220

I am no longer there, I will ask my former supervisor, let's see if they can stick it somewhere. in the meantime, here are the accessions for the reference: http://cdna.eva.mpg.de/neandertal/Chagyrskaya/bam/README

ADD REPLYlink written 10 weeks ago by Gabriel R.2.5k

@ Gabriel R, Thank you for the special favor... and yes, of course, I have gone through this URL before that you gave: http://cdna.eva.mpg.de/neandertal/Chagyrskaya/bam/README

All the contig names and length information is in accordance with the BAM file that's given in the README except phiX length. For example: In the readme, the accession code for phiX is given: NC_001422.1 length: 5386

but in the downloaded BAM files in the header section:

@SQ SN:phiX LN:6386

with different length. that shows might be different phiX genome is used perhaps. Do you have any idea about that?

ADD REPLYlink written 10 weeks ago by SOHAIL220

You can find the sequence of the phiX genome that Illumina uses at this link.

ADD REPLYlink written 10 weeks ago by genomax59k

@genomax

Hi, The illumina genome that you shared has the following info:

From the "genome.dict" file of the illumina phiX genome:

@SQ SN:phix LN:5386 UR:file:/illumina/scratch/iGenomes/PhiX/Illumina/RTA/Sequence/WholeGenomeFasta/genome.fa    M5:bb9dae7b38a25a45dae8e3179d7c4241

Different than what I mentioned earlier...

edit: if we believe the figure are correct, that's 1000 nt difference (6386-5386)

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by SOHAIL220

That is the official Illumina phiX sequence. NCBI's version is also the same length (Illumina's has a few SNP's compared to the NCBI reference). Unless you get clarification from the source of your BAM files it would be difficult to explain the difference you see.

ADD REPLYlink written 10 weeks ago by genomax59k
2
gravatar for kelso
4 days ago by
kelso20
kelso20 wrote:

If you want exactly the MPI-EVA version (which includes a circularised phiX and extended reference mtDNA) the construction is similar to the reference used by the 1000 Genomes Project (compare ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/), with minor changes. It's made as follows :

1 Download individual chrs from ensembl ftp (just like 1000g) ftp://ftp.ensembl.org/pub/current_fasta/homo_sapiens/dna/

2a Download the newer version of the mitochondrion (NC_012920, just like 1000g) http://www.ncbi.nlm.nih.gov/nuccore/251831106

2b Copy the first 1000bp of the mitochondrion onto its end. The resulting sequence is named "MT".

3 Download the concatenated decoy sequences from 1000 Genomes: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5cs.fa.gz

Also compare their READMEs: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/README_human_reference_20110707 ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.slides.pdf

4 Download the Human herpes virus (NC_007605, aka EBV) from NCBI, just like 1000g. The sequence is then named "NC_007605". http://www.ncbi.nlm.nih.gov/nuccore/NC_007605

5a Download phiX-174 reference (NC_001422). http://www.ncbi.nlm.nih.gov/nuccore/NC_001422

5b Copy the first 1000bp of phiX onto its end, name the result "phiX".

6 Create a reference (whole_genome.fa) with chrs 1-22, X, Y, extended NC_012920 MT, the non-chromosomal supercontigs, the NC_007605 EBV, the decoy sequences (hs37d5), extended phiX. The order is chosen to match 1000 Genomes (plus phiX), see their fai file: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz.fai

Note that two sequences (MT, PhiX) are circular and have been extended to facilate alignment. The correct incantation to wrap these alignments to their correct length is

  bam-rewrap MT:16569 phiX:5386

or

  bam-rmdup -z MT:16569 -z phiX:5386
ADD COMMENTlink modified 4 days ago by genomax59k • written 4 days ago by kelso20

This answer is referring to programs found in biohazard-tools at this link.

ADD REPLYlink written 4 days ago by genomax59k

Thanks, @Kelso for the kind response.

ADD REPLYlink written 4 days ago by SOHAIL220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1506 users visited in the last hour