Question

Subset reads from bam file

0

Entering edit mode

5.9 years ago

fabio.demartino2 • 0

Hello! I am pretty new to the bioinformatic field and I am dealing with some RNAseq data.

I am trying to run the protocol described in Callari et al. BMC Genomics 2018 in order to discriminate mouse and human reads into my PDX-derived samples.

In order to do so, as suggested in the paper and following some of their commands, I merged the human and mouse genome and run Tophat on that. I finally got a .bam file that is now containing reads aligned to both the human and the mouse genome.

In order to proceed with the remaining RNAseq analysis, I want to subselect only those reads that matched only to either the human's or to the mouse's genome.

The problem is that when I run:

samtools view -h path/to/ICRG_aligned.bam
| grep -v $'\t''m.chr'
| grep -v 'SN:m.chr'
| samtools view -b - > path/to/genome_human_only.bam

it returns me an error saying: "[bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). [main_samview] fail to read the header from "-"."

This code was taken from the original code reported in the paper.

Also, I checked my bam file and it contains normal @-starting headers.

Can anybody help me to figure out what is the problem in my code? Or maybe if there is an easier approach to tackle this easy task?

Note: the chromosome's names of my fastq and gtf files were modified so that the human's ones are reported as "chr" whereas the mouse's as "m.chr".

Thank you in advance!

RNA-Seq alignment sequence • 1.4k views

ADD COMMENT • link updated 5.9 years ago by h.mon 35k • written 5.9 years ago by fabio.demartino2 • 0

score 0 · Answer 1 · 2018-06-13

Use bbsplit.sh from the BBTools package to split the reads into only mouse, only human, ambiguous and none. It is faster than Tophat, and probably more sensitive as well.

And please, do not use ALL CAPS, it is considered as yelling, thus rude.

edit: I edited your title to remove CAPS, and also added code-block formatting for the samtools command-line. You can add formatting by using the buttons right above the post editor area.