Problem generating paired end reads when converting Cell Ranger's BAM result to FASTQ
1
0
Entering edit mode
4 weeks ago
ntuzov • 0

My final goal is to make kallisto|bustools ("kb count") take a BAM file as input. Since kb requires FASTQ input, I have to convert BAM to FASTQ first.

The original FASTQ files for this sample look like:

SRR6470906_S1_L001_R1_001.fastq.gz
SRR6470906_S1_L001_R2_001.fastq.gz 
SRR6470906_S1_L002_R1_001.fastq.gz 
SRR6470906_S1_L002_R2_001.fastq.gz

Cell Ranger produces a single BAM file from them, and its stats are:

244560805 + 0 in total (QC-passed reads + QC-failed reads)
244560805 + 0 primary
0 + 0 secondary
0 + 0 supplementary
109475755 + 0 duplicates
109475755 + 0 primary duplicates
238794765 + 0 mapped (97.64% : N/A)
238794765 + 0 primary mapped (97.64% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

It appears to be single end; when I try to convert it to two FASTQ files (samtools fastq -1 .. -2...), the output files are empty. Therefore, my conversion command is:

samtools sort -n --threads 64 -O bam possorted_genome_bam.bam | samtools fastq --threads 64 > converted_906.fastq.gz 

which generates that single FASTQ file. "kb count" requires at least two FASTQ files and it throws an error otherwise.

Is there a way to generate the proper FASTQ files (ideally, all four) from Cell Ranger's BAM or it's a dead end?

Update

I followed ATpoint's advice and used 10x's bamtofastq utility:

/usr/bin/time bamtofastq_linux --nthreads=64 ./SRR6470906_S1/possorted_genome_bam.bam /SRR6470906_S1_FASTQ_converted

It creates 12 FASTQ files:

$ ls -l SRR6470906_S1_FASTQ_converted/*
SRR6470906_S1_FASTQ_converted/SRR6470906_S1_0_1_HL73JBCXY:
total 15199836
-rw-r--r-- 1 flow flowuser 2065966716 Mar 29 13:54 bamtofastq_S1_L002_R1_001.fastq.gz
-rw-r--r-- 1 flow flowuser 2063237359 Mar 29 14:03 bamtofastq_S1_L002_R1_002.fastq.gz
-rw-r--r-- 1 flow flowuser 1571415091 Mar 29 14:09 bamtofastq_S1_L002_R1_003.fastq.gz
-rw-r--r-- 1 flow flowuser 3517361843 Mar 29 13:54 bamtofastq_S1_L002_R2_001.fastq.gz
-rw-r--r-- 1 flow flowuser 3553693269 Mar 29 14:03 bamtofastq_S1_L002_R2_002.fastq.gz
-rw-r--r-- 1 flow flowuser 2792919205 Mar 29 14:09 bamtofastq_S1_L002_R2_003.fastq.gz

SRR6470906_S1_FASTQ_converted/SRR6470906_S1_0_1_HLFGJBCXY:
total 11609908
-rw-r--r-- 1 flow flowuser 2026939869 Mar 29 13:57 bamtofastq_S1_L002_R1_001.fastq.gz
-rw-r--r-- 1 flow flowuser 2020845366 Mar 29 14:07 bamtofastq_S1_L002_R1_002.fastq.gz
-rw-r--r-- 1 flow flowuser  291803225 Mar 29 14:09 bamtofastq_S1_L002_R1_003.fastq.gz
-rw-r--r-- 1 flow flowuser 3468170873 Mar 29 13:57 bamtofastq_S1_L002_R2_001.fastq.gz
-rw-r--r-- 1 flow flowuser 3500377653 Mar 29 14:07 bamtofastq_S1_L002_R2_002.fastq.gz
-rw-r--r-- 1 flow flowuser  580369333 Mar 29 14:09 bamtofastq_S1_L002_R2_003.fastq.gz

My new question is: how come that in the original files we saw both L001 and L002, but the converted files have only L002?

fastq bam • 417 views
ADD COMMENT
1
Entering edit mode

There's no point in separating lanes 1 and 2. It's fine if they are combined.

ADD REPLY
2
0
Entering edit mode

Thanks for replying. I ran it, but then there is one more question about bamtofastq output (see above).

ADD REPLY
1
Entering edit mode

SRA uses its own nomenclature, while CellRanger reads the original file names. Thats my guess. bamtofastq is save to use, continue with returned fastq files.

ADD REPLY

Login before adding your answer.

Traffic: 1602 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6