Question: small RNA-seq reads grouping by adjacency
0
gravatar for apbiomol
2.0 years ago by
apbiomol0
apbiomol0 wrote:

Dear BioStars,

I am getting many helps from BioStars, and it's my first time to post question here, a little nervous.

With SAM file, I want to do grouping small RNA-seq reads mapping within a certain intervals (e.g. 100 nt) of each other into clusters, and rank the clusters by the numbers of reads.

I am just wondering there are any tools to implement this job? Thanks for your help!

rna-seq alignment • 592 views
ADD COMMENTlink modified 2.0 years ago by A. Domingues2.0k • written 2.0 years ago by apbiomol0
1

I am guessing:

  • you want to bin your reads into bins of constant width over the whole genome?
  • summarize each bin by number of reads

You need to search for something like "generate equally sized genomic bins" or "read binning" "generate equally sized genomic intervals". There are several ways to do this, either in Bedtools or R.

See:

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Michael Dondrup46k
1

If I read the question correctly I think the OP is looking for something like piRNA clusters - regions enriched for certain types of smRNA. Slightlym different approach because the regions/clusters would have variable lengths, and most of the genome would be free of these. More like a aggregation operation I think.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by A. Domingues2.0k
1

I think it is very hard to tell what is really wanted here.

ADD REPLYlink written 2.0 years ago by Michael Dondrup46k

You are right. I am looking for small RNA clusters enriched in certain genomic regions, just like piRNA clusters. As Michael said, I need to do binning small RNA-seq reads within, for example, 100 nt of each other. But, I want to keep alignment information of reads rather than converting BED format, because I need to map reads in a cluster again to see where the reads come from (like intergenic or coding region?). Thanks

ADD REPLYlink written 2.0 years ago by apbiomol0

samtools (http://www.htslib.org/doc/samtools.html), you need to first sort SAM file then use samtools view to cut out certain regions.

ADD REPLYlink written 2.0 years ago by syrttgump30

You can try using MACS tool and then process the required result from the output file. Below is the command

/tool/MACS/MACS-1.4.2/bin/sam2bed input.sam output.bed

Hope this solves your problem.

ADD REPLYlink written 2.0 years ago by mks002160
0
gravatar for A. Domingues
2.0 years ago by
A. Domingues2.0k
Mainz, Germany
A. Domingues2.0k wrote:

I suggests a combination of bedtools merge or cluster, depending on what is the final goal. For instance, using merge:

## code untested
bamToBed -i my.bam \ # converts bam to bed. Ensures that read ID is kept which will be useful for counting
   | mergeBed -i stdin -c 4 -o count \ # merges reads within 100 base pairs and counts the number of reads in each merged interval using the read ID in col4
   | head # peek results before saving

Keep in mind that will not account for strandness of reads. Use the options -s or -S for that. Well read the tool documentation for fine tuning.

Using cluster should also work, but it would require a little more work and a merge wnayway. The only advantage I see over mergeis that it would allow you to keep the read IDs for each cluster.

ADD COMMENTlink written 2.0 years ago by A. Domingues2.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 635 users visited in the last hour