How to analyze CAGE-Seq data?
1
1
Entering edit mode
7.0 years ago

Hi all,

I'm now 6 months into the field of NGS and analysis of sequencing data. I have been working on RNA-Seq data and recently, just started to venture into CAGE-Seq data.

I wanted to ask how do we actually map CAGE-Seq data? We did a paired-end sequencing for the CAGE data and then got the fastq files. After cleaning, I got the clean reads files for read1 and read2 but both of them are of different size. When I run them on STAR, it said that mapping could not be done as the run finished for 1 read while the other 1 is still not.

Is this normal for CAGE-Seq data? Or should we just map read1 only as we are only interested in the TSS i.e. reads seq from 5' end?

I am a bit confused how to process CAGE data here.

Please give some guidance & advice. Thank you very much.

CAGE • 3.8k views
ADD COMMENT
0
Entering edit mode

After cleaning, I got the clean reads files for read1 and read2 but both of them are of different size.

Can you elaborate on the "cleaning" part?
And do you mean different read lengths or different number of reads in R1 vs R2?

ADD REPLY
0
Entering edit mode

Cleaning is where I trimmed off 4 basepairs off the reads which correspond to the index of the samples they represent.

Yes, I get different number of reads for R1 & R2.

ADD REPLY
0
Entering edit mode

Please post names and versions of the programs you used, and also the exact commands. You should clean and map R1+R2 as paired files, i. e., simultaneously and keeping proper pair information.

ADD REPLY
0
Entering edit mode

Here's the reads processing before mapping...

  1. Index identification of samples - using custom perl file

read_skipper.pl R1_step1.fq CAC

  1. Trim away the index

fastx_trimmer -f 4 -i R1_step1.fq -o R1_trimmed.fq -Q33

  1. Using perl file to remove reads with Q<20

perl ../IndexQuality_CAGE_20.pl R1_trimmed.fq R1_trimmed.fq I.fq R1_20.fq R1_20.2.fq I_20.fq

  1. Reads cleaning using QCleaner (I have to check what does this clean as it's in Japanese)

qcleaner_renew_v3.1.pl --i ./R1_step1_skip.fq --o R1_clean.fastq --log qclog.txt

qcleaner_renew_v3.1.pl --i ./Undetermined_S0_L001_R2_001.fastq --o R2_clean.fastq --log qclog.txt

ADD REPLY
0
Entering edit mode

fastx does not preserve pairing, use Trimmomatic or BBDuk do trim adapters and low quality.

ADD REPLY
0
Entering edit mode

Thank you for your suggestion. I will try it out and see if it works.

ADD REPLY
2
Entering edit mode
7.0 years ago
Charles Plessy ★ 2.9k

If your CAGE data is paired-end, then I recommend to align it paired end, and to only transform it to TSS positions at the end.

Here is a toy example on how to process CAGE data (the nanoCAGE variant, which can be sequenced paired-end).

And here is a preprint showing more or less the same on a different dataset with a different workflow system.

Recent versions of CAGEr can load paired-end CAGE data in BAM or BED format.

ADD COMMENT
0
Entering edit mode

How do you transform aligned reads to TSS positions?

Thank you very much for your references.

ADD REPLY
0
Entering edit mode

For paired-end data my favourite approach is to convert paired alignments from BAM format, where each mate is represented on separate lines, to BED12 format, where each pair is on one line, using the pairedBamToBed12 tool. The 5′ end of the BED entries is the CAGE TSS. CAGEr supports loading data in BAM, BED, and other formats. I recommend you to read its vignette.

ADD REPLY
0
Entering edit mode

Thank you very much! I will try it out and see if it work out for my data.

ADD REPLY

Login before adding your answer.

Traffic: 1789 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6