Question

Extract CDS reads from a BAM file

0

Entering edit mode

2.5 years ago

Peanut • 0

Hi everyone,

I am looking for a way to extract the sequences of the reads that map to the CDS of my reference.

I used samtools -b -L CDS.bed BAM.bam > BAM_CDS.bam. That allowed me to get all the reads that overlap with any of the CDS listed in my .bed.

But, I do not want to keep the part of the reads that are not inside the CDS coordinates.

For example, if my CDS coordinates are 100-200, and I have a 90bp read starting at 50, I only want to keep the last 40bp of that read (from the start of the CDS to then end of the read).

My first thought was to convert that .bam to a fasta with samtools fasta to then use bedtools getfasta, but when I do that, the fasta output by samtools fasta does not have the chromosome or coordinates information of each read.

The final goal would be to do a variant call of only the CDS (I already have the all the references CDS sequences as a fasta for that)

I am fairly new to this kind of analyses so any help would be great!

Thanks
Alex

samtools BAM BED • 1.1k views

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 2.5 years ago by Peanut • 0

score 1 · Answer 1 · 2023-04-24

1

Entering edit mode

2.5 years ago

GenoMax 154k

Have you looked at smatools ampliconclip: http://www.htslib.org/doc/samtools-ampliconclip.html