Question: length of combined transcripts
2
gravatar for bsp017
4.9 years ago by
bsp01730
Denmark, Copenhagen, UCPH
bsp01730 wrote:

Hi all,

Is there a tool to calculate the combined length of transcripts which map to a given interval. For example as a function of bedtools or htseq? I can calculate the number of reads mapping to a particular gene using these tools but not the length of the gene segment which has been covered. What I want to know is, which genes in a gff or bed file have alignments from my sorted bam file covering more than 100 bp of their entire length. 

Thanks,

 

James

rna-seq • 1.3k views
ADD COMMENTlink modified 4.9 years ago by Alex Reynolds31k • written 4.9 years ago by bsp01730
2
gravatar for Alex Reynolds
4.9 years ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

BEDOPS bedmap --bases and bedmap --bases-uniq offer total and distinct counts of bases of map elements (e.g., transcripts or reads) which overlap ("map" to) a reference interval (e.g., a set of genes):

$ bedmap --echo --bases genes.bed reads.bed \
    > total_read_base_count_over_genes.bed
$ bedmap --echo --bases-uniq genes.bed reads.bed \
    > distinct_read_base_count_over_genes.bed

It's not clear which option you want, but the --bases-uniq option would effectively merge the overlapping reads into a contiguous region, and then give the length of the overlapping part of the contiguous region. The --bases option simply gives the total sum of the length of the overlapping part of each overlapping read.

Regardless of which option you pick, if you add --delim '\t', you can easily pipe to awk to quickly filter mapped results, where genes have alignments covering 100 bp or more:

$ bedmap --echo --bases --delim '\t' genes.bed reads.bed \
    | awk '$NF >= 100' - \
    > genes_with_all_reads_a_total_of_100_bases_or_more_overlap.bed

If you need to convert BAM-formatted reads to BED, you could use BEDOPS bam2bed and pipe BED-formatted reads to bedmap:

$ bam2bed < reads.bam \
    | bedmap --echo --bases --delim '\t' genes.bed - \
    | awk '$NF >= 100' - \
    > genes_with_all_reads_a_total_of_100_bases_or_more_overlap.bed
ADD COMMENTlink modified 10 months ago by RamRS30k • written 4.9 years ago by Alex Reynolds31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1374 users visited in the last hour