Hello! I am pretty new to the bioinformatic field and I am dealing with some RNAseq data.
I am trying to run the protocol described in Callari et al. BMC Genomics 2018 in order to discriminate mouse and human reads into my PDX-derived samples.
In order to do so, as suggested in the paper and following some of their commands, I merged the human and mouse genome and run Tophat on that. I finally got a .bam file that is now containing reads aligned to both the human and the mouse genome.
In order to proceed with the remaining RNAseq analysis, I want to subselect only those reads that matched only to either the human's or to the mouse's genome.
The problems are that when I run (the following code was taken from the original code reported in the paper):
samtools view -h path/to/ICRG_aligned.bam | grep -v $'\t''m.chr' | grep -v 'SN:m.chr' | samtools view -b > path/to/genome_human_only.bam
1) It returns me a .bam file in the standard-output.txt, even though I specify the path and the name of my desidered output file
2) This output file is even a bit larger than my original .bam file, even though I would expect it to be smaller in size since I should have gotten rid of all the lines containing the "m.chr" string.
Can anybody help me to figure out if there is a problem with this code?
Note: the chromosome's names of my fastq and gtf files were modified so that the human's ones are reported as "chr" whereas the mouse's as "m.chr".
Thank you in advance!