Filteration of uniquely mapped reads
2
0
Entering edit mode
19 months ago

Hello

I have BAM-full file with reads mapped to "human and mouse" chromosome file. Now I would like to extract reads mapped only to "mouse" (means not mapped to human chromosome". This is the protocol I am using:

  1. From BAM-full, extract reads mapped to Mouse chromosome and convert to fastq1
  2. From BAM-full, extract reads mapped to Human chromosome and convert to fastq2
  3. Extract read ids from both fastq1 and fastq2 and identified non-overlapped "reads" ids from fastq1

I would like to know if there is a samtools command or software to apply these filtration steps?

Thanks a lot in advance

samtools bam • 518 views
ADD COMMENT
0
Entering edit mode
19 months ago

I would recommend doing the alignment directly with bbsplit from BBTools with ambiguous2=toss and you are done.

ADD COMMENT
0
Entering edit mode
19 months ago
GenoMax 141k

While using bbsplit is a painless way of doing this once, if you do want to follow your original line of thought you can do the following.

  1. Split human/mouse reads into separate BAMs using ideas from:
    How To Split A Bam File By Chromosome
    Extract ONLY chromosomes 1-22 from bam file - removing extraneous chr annotations Hopefully you have unique chromosome names from mouse and human otherwise nothing is going to work.

  2. Once you have the BAMs. you can samtools collate | samtools fastq to get fastq-format reads. See the options for those commands to deal with secondary mappings etc. Step 1 and 2 can possibly be combined via a single pipe

  3. Extract fastq headers by something line grep "@M01923:976:00000000" file.fq (use zgrep if your files are compressed). Remove @ at beginning of the header at the same time. example below

    zgrep @first_part_of_headers file_R1_001.fastq.gz | sed 's/^@//g' > fastq_headers_R1

  4. Use filterbyname.sh from BBMap suite to find reads that are present in one or other file.

ADD COMMENT

Login before adding your answer.

Traffic: 2153 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6