CRAM file to BAM file without reference
1
0
Entering edit mode
3.6 years ago

Hi! How can I convert a CRAM file to BAM file without using any FASTA file? I found the following command but it doesn't work :

samtools view -bh file.cram  -o file.bam

I have downloaded a CRAM file and i should convert it to BAM file but i can't so if there is another command or another way please let me know it. Thanks!

alignment • 4.2k views
ADD COMMENT
0
Entering edit mode

after running the command it is supposed to convert the file,but through 30 minutes I was looking at the bam file. I found that it is just 112.5kb although the CRAM file is 3.9 Gb! as well as the command doesn't give me any response!

actually, I'm a beginner and learning from the Computational Exome and Genome Analysis book. as it is written:

4.4 1000 GENOMES EXOME DATA For the investigation of de novo variants in Chapter 18 we downloaded WES data for a trio (mother, father, and daughter). The sequences were generated as a part of the 1000 Genomes project [111]. The data can be downloaded at the 1000G FTP site using wget:

$ wget --progress=bar \ ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/\data_collections/1000_genomes_project/\ data/CEU/NA12891/exome_alignment/\ NA12891.alt_bwamem_GRCh38DH.20150826.CEU.exome.cram

The sequence data is compressed in the CRAM format that is de-scribed in Chapter 9. It can be converted to BAM format using the following command.

$ samtools view -bh \ NA12891.alt_bwamem_GRCh38DH.20150826.CEU.exome.cram \
-o NA12891.bam

The NA12891.fasta file is 100Gb I can't download it

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question

ADD REPLY
0
Entering edit mode

Thanks to you for drawing my attention.

ADD REPLY
0
Entering edit mode

The NA12891.fasta file is 100Gb I can't download it

Can you link the file? A reference fasta file is unlikely to be 100G.

ADD REPLY
0
Entering edit mode

That is the SRA data file for the sample. It seems to be aligned to the NCBI GRCh37 so that should be the reference you need to use for converting these files.

ADD REPLY
1
Entering edit mode
3.6 years ago
ATpoint 82k

Sorry, accidentally deleted my previous answer.


The problem here is that you need the reference fasta file that was used to build the CRAM file in the first place. CRAM uses reference-based compression, so it essentially only stores those nucleotides that (per alignment) are different from the reference. Hence, without reference, you cannot recreate the original sequence (or BAM in this case). The reference is the genome fasta file that was used. For human this is a few GBs in size, not 100G what you say, don't mix it up fastq files, so sequencing data. It is the reference genome sequence that was used to make the CRAM file.

ADD COMMENT

Login before adding your answer.

Traffic: 1855 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6