How to convert BAM to paired FASTQ from Nanopore data?
2
0
Entering edit mode
6 months ago
ej6474 • 0

I have a set of coordinate sorted, mapped BAM files for SARS-CoV-2 sequencing data through Nanopore MinION. that I need to generate paired fastq files for.

When I traditionally name sort the BAM files, run samtools fixmate and then bedtools bamtofastq, my paired fastq files are empty. When I run samtools flagstat on my bam files, I see that read1 and read 2 are 0:

52244 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
737 + 0 supplementary
0 + 0 duplicates
52244 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

I am not fully familiar with Nanopore data so what is the best way to extract paired reads from my BAM files or are there any intermediate steps I am missing?

nanopore • 555 views
ADD COMMENT
0
Entering edit mode
6 months ago
GenoMax 108k

AFAIK there is no paired-end nanopore data. Can you show us the header of the BAM file so we can see how the data was aligned (samtools view -H your bam | grep "@PG")?

ADD COMMENT
0
Entering edit mode

I see. Here is the header.:

$ samtools view -H sample1-barcode13.sorted.bam | grep "@PG"
@PG ID:minimap2 PN:minimap2 VN:2.17-r941    CL:minimap2 -a -x map-ont -t 4 /hpf/largeprojects/pray/llau/coronavirus/20200929/artic-ncov2019/primer_schemes/nCoV-2019/V3/nCoV-2019.reference.fasta /hpf/largeprojects/pray/lljau/coronavirus/2021-03-11_BATCH02/artic-ncov/sample1-barcode13/20210311_1957_MN32429_FAP53991_45f7e022-PHLON21-SARS05068_barcode13.fastq
@PG ID:samtools PN:samtools PP:minimap2 VN:1.10 CL:samtools view -bS -F 4 -
@PG ID:samtools.1   PN:samtools PP:samtools VN:1.10 CL:samtools sort -o sample1-barcode13.sorted.bam

If that's the case, it probably makes sense to just convert the BAM to a single FASTQ file?

ADD REPLY
1
Entering edit mode

If you look at the minimap line you can see that there is only one fastq file.

So try something like samtools collate -u -O in_pos.bam | samtools fastq -0 /dev/null in_name.bam > all_reads.fq.

ADD REPLY
0
Entering edit mode
6 months ago
colindaven ★ 3.3k

Exactly as Genomax said, there is no PE nanopore data at present, so you won't be able to do this. Possibly you are trying to do a tutorial meant for Illumina reads ? Anyway, extraction of FASTQ from bam is not too hard. If the command Genomax gave you is not sufficient you can try

 samtools fastq in.bam > all_reads.fq
ADD COMMENT
0
Entering edit mode

I see! I was indeed following a process meant for Illumina data but I am completely new to working with Nanopore. Thank you so much!

ADD REPLY

Login before adding your answer.

Traffic: 1491 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6