Question: Summarizing Mirna-Seq Data Based On Ucsc Hg19 Alignment Results
3
gravatar for gundalav
5.7 years ago by
gundalav290
La La Land
gundalav290 wrote:

I have miRNA-seq data (Single end) which I map to the whole UCSC hg19 genome. Now given the SAM output of this mapping, is there a way I can summarize the alignment over several genomic features:

  1. Unaligned
  2. Mature miRNA
  3. precursor miRNA
  4. piRNA
  5. lincRNA
  6. human Ribosomal RNA
  7. snoRNA
  8. human5S rDNA
  9. snRNA

Namely for each of the above features how many of my reads (or percentage) are aligned?

I know CLC-BIO or Illumina inbox software possibly already have that. But I'm looking for noncommercial and tweakable way to do it.

genome alignment mirna expression • 2.8k views
ADD COMMENTlink modified 5.6 years ago by Biostar ♦♦ 20 • written 5.7 years ago by gundalav290
4
gravatar for Martombo
5.6 years ago by
Martombo2.5k
Seville, ES
Martombo2.5k wrote:

you can use biomart http://www.ensembl.org/biomart/martview or ucsc tables http://genome.ucsc.edu/cgi-bin/hgTables to get the annotations you need for the different classes of transcripts you want to study (specify the feature type with the filter option). then you can use htseq-count http://www-huber.embl.de/users/anders/HTSeq/doc/count.html to count the number of reads in your sam files that map to the annotations. you may need to convert the table you downloaded in the gff format (ucsc tables can output the gtf format directly). all the different genomic feature can be merged in the same file. in that case you can deal with overlapping features as described on the htseq-count page.

ADD COMMENTlink modified 5.6 years ago • written 5.6 years ago by Martombo2.5k

a correction on this: if you're interested in transcripts that have different identical copies on the genome (like I realized snRNAs have, for example), you cannot use the default options of HTSeq which discard multi-mapped reads. You should lower the value of the -a option and be also aware that HTSeq would still not count reads with the NH field indicating a multiple mapping. Even better and more easily, you could use RSEM.

ADD REPLYlink written 3.3 years ago by Martombo2.5k
3
gravatar for brentp
5.6 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

I would use BEDTools. If you have a BED file for each of the items 2-9, you can use, e.g.

bedtools coverage -abam your.bam -b snoRNA.bed

and it'd be pretty simple to write a script to do that for each feature type and write a summary output.

ADD COMMENTlink written 5.6 years ago by brentp23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1514 users visited in the last hour