Which Aligner Is Best Suited For Clip-Seq Data?
3
2
Entering edit mode
9.0 years ago
Mathew Bunj ▴ 40

I have an RNAseq data (CLIP-seq) and want to find out RNA-binding partner to my proteins of interest. The data is PE 2X50. Which aligner will be most suitable to detect RNAs in my sequencing data?

Thanks.

alignment rna sequencing • 5.8k views
ADD COMMENT
0
Entering edit mode

Hi @Mathew Bunj, is your CLIP-seq dataset of the kind that generates mutations (e.g. UV light protocol) with respect to the reference genome?

ADD REPLY
1
Entering edit mode

The procedure include UV cross link and yes it may be possible it can generate soem mutations particularly T to C. Do You have any suggestion?

ADD REPLY
4
Entering edit mode
9.0 years ago
Ryan Dale 5.0k

I don't see why a spliced aligner (e.g. TopHat, but see recent question Is tophat the only mapper to consider for RNA-seq data?) wouldn't work for CLIP-seq/HITS-CLIP. A quick search for papers using the technique (1, 2, 3) shows they are not consistent (BLAT, a custom aligner, or MosaikAligner). Can't hurt to try different aligners, I suppose.

There may be other things to be careful about besides aligner choice. For example, from this protocol, it looks like there's a digestion step involved. I'm not sure if this means the fragments you end up sequencing are necessarily small . . . but depending on the experimental protocol used you may need to be careful about insert sizes for the PE reads and/or trimming adapter sequence.

ADD COMMENT
0
Entering edit mode

I wonder will TopHat identify RNA?

ADD REPLY
3
Entering edit mode
9.0 years ago

If your protocol includes UV cross link and can generate T to C mutations in some of the reads, but not all of them, one possibility is to assemble the read clusters first, then align the region under the peak of reads to the reference genome. Pinball does just that:

Pinball is an alignment-free ChIP-seq and HITS-CLIP analysis tool:
https://github.com/avilella/pinball/blob/master/INSTALL

If you want to skip installation and set up, you can try the virtual machine here:
ftp://ftp.ebi.ac.uk/pub/databases/ensembl/avilella/pinball/PinballVM.1.0.4.ova
The installation procedure of the virtual machine is the same as described here:
http://www.ensembl.org/info/data/virtual_machine.html

Depending on your read length, you may want to tweak the --error-rate parameter, to allow reads with T/C or other mutations to still align with mismatches. For example, if you have 36bp reads, require a 2/3 of the read length for overlap=24bp, and want to allow 1 mismatch every 24bp, you can set --error-rate=0.042 (>1/24).

Hope it helps.

ADD COMMENT
1
Entering edit mode

could you maybe check permissions on your VM download link? I cant download it.

ADD REPLY
0
Entering edit mode

Thanks for the heads up, I chmod'ed the files now.

ADD REPLY
1
Entering edit mode

I installed the VM but it is giving me two errors- missing the checkout Variation Missing the checkout Funcgen

ADD REPLY
1
Entering edit mode
9.0 years ago
UnivStudent ▴ 430

I would take a look at this paper that explains the data analysis and some considerations to make in order to find single-bp resolution binding sites: Mapping in vivo protein-RNA interactions at single-nucleotide resolution from HITS-CLIP data by Zhang & Darnell.

ADD COMMENT

Login before adding your answer.

Traffic: 2433 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6