Question: length of combined transcripts
2
gravatar for bsp017
3.4 years ago by
bsp01720
Wales, Bangor, Bangor Uni
bsp01720 wrote:

Hi all,

Is there a tool to calculate the combined length of transcripts which map to a given interval. For example as a function of bedtools or htseq? I can calculate the number of reads mapping to a particular gene using these tools but not the length of the gene segment which has been covered. What I want to know is, which genes in a gff or bed file have alignments from my sorted bam file covering more than 100 bp of their entire length. 

Thanks,

 

James

rna-seq • 1.0k views
ADD COMMENTlink modified 3.4 years ago by Alex Reynolds28k • written 3.4 years ago by bsp01720
2
gravatar for Alex Reynolds
3.4 years ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

BEDOPS bedmap --bases and bedmap --bases-uniq offer total and distinct counts of bases of map elements (e.g., transcripts or reads) which overlap ("map" to) a reference interval (e.g., a set of genes):

$ bedmap --echo --bases genes.bed reads.bed \
    > total_read_base_count_over_genes.bed
$ bedmap --echo --bases-uniq genes.bed reads.bed \
    > distinct_read_base_count_over_genes.bed

It's not clear which option you want, but the --bases-uniq option would effectively merge the overlapping reads into a contiguous region, and then give the length of the overlapping part of the contiguous region. The --bases option simply gives the total sum of the length of the overlapping part of each overlapping read.

Regardless of which option you pick, if you add --delim '\t', you can easily pipe to awk to quickly filter mapped results, where genes have alignments covering 100 bp or more:

$ bedmap --echo --bases --delim '\t' genes.bed reads.bed \
    | awk '$NF >= 100' - \
    > genes_with_all_reads_a_total_of_100_bases_or_more_overlap.bed

If you need to convert BAM-formatted reads to BED, you could use BEDOPS bam2bed and pipe BED-formatted reads to bedmap:

$ bam2bed < reads.bam \
    | bedmap --echo --bases --delim '\t' genes.bed - \
    | awk '$NF >= 100' - \
    > genes_with_all_reads_a_total_of_100_bases_or_more_overlap.bed
ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Alex Reynolds28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 925 users visited in the last hour