Extract CDS reads from a BAM file
1
0
Entering edit mode
2.5 years ago
Peanut • 0

Hi everyone,

I am looking for a way to extract the sequences of the reads that map to the CDS of my reference.

I used samtools -b -L CDS.bed BAM.bam > BAM_CDS.bam. That allowed me to get all the reads that overlap with any of the CDS listed in my .bed.

But, I do not want to keep the part of the reads that are not inside the CDS coordinates.

For example, if my CDS coordinates are 100-200, and I have a 90bp read starting at 50, I only want to keep the last 40bp of that read (from the start of the CDS to then end of the read).

My first thought was to convert that .bam to a fasta with samtools fasta to then use bedtools getfasta, but when I do that, the fasta output by samtools fasta does not have the chromosome or coordinates information of each read.

The final goal would be to do a variant call of only the CDS (I already have the all the references CDS sequences as a fasta for that)

I am fairly new to this kind of analyses so any help would be great!

Thanks
Alex

samtools BAM BED • 1.1k views
ADD COMMENT
1
Entering edit mode
2.5 years ago
GenoMax 154k

Have you looked at smatools ampliconclip: http://www.htslib.org/doc/samtools-ampliconclip.html

ADD COMMENT
0
Entering edit mode

Oh wow perfect thanks! I cannot believe I missed that.

ADD REPLY

Login before adding your answer.

Traffic: 8979 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6