correct fastq files for the expression analysis
0
0
Entering edit mode
13 months ago
Sara ▴ 220

I am planning to perform expression analysis on RNAseq data but instead of fastq files, I received cram files. I used samtools to convert cram to fastq files. if you have experience, would you please let me know if the resulting fastq would have any differences with the original fastq files (I do not have access to the original fastq files). here is the command I used:

samtools view -b -T test.cram > test.bam
samtools fastq -1 test_R1.fastq.gz -2 test_R2.fastq.gz  test.bam

cram • 537 views
1
Entering edit mode

You should name sort (samtools sort -n or samtools collate) the BAM files before converting to fastq. If your CRAM file did not include unmapped reads then you will not be able to get those. Fastq data you end up with should be identical to information in your CRAM file.

1
Entering edit mode

Since cram is reference based format, my understanding is that user must know the reference file used in generating cram and the same should be supplied in generating bam.

1
Entering edit mode

Since OP has -T in their command they must have provided the reference when doing the actual conversion.

Original CRAM files could have contained unaligned fastq data and will not need a reference : Is it possible to directly convert fastq to CRAM ?

samtools is able to find a reference based on UR field (if that information exists) : Converting CRAM to BAM without reference fasta

0
Entering edit mode

I would not create cram files without reference (First caveat) nor do I recommend it. If service/cram provider does that, no comments on that.

Second caveat works only if reference file is located as per UR field, on OP machine/network path/public URL. Since it is from a third party, I think samtools may not be able to find the reference. Otherwise, user has to provide by -T.

If OP could post UR field from cram file, it would help better understanding the location of reference file used in cram file.