Question: Hi-C data alignment with star
1
gravatar for Dataminer
9 months ago by
Dataminer2.6k
Netherlands
Dataminer2.6k wrote:

Dear community,

Does anyone has experience with aligning data using star aligner? It would be very kind of you to share the syntax that you used or have a look at mine and point corrections (chimeric reads alignments etc)

As reads from both strands need to be mapped seperately, I am using following command:

STAR --genomeDir /data/genomes/ --readFilesIn /data/raw/HiREAD/HiC/T-Rep1_R2_001.fastq.gz --readFilesCommand gunzip -c --alignIntronMax 1 --alignIntronMin 2 --outFilterMultimapNmax 1 --runThreadN 8 --outFileNamePrefix T_Rep1_L2

Kindly let me know, if I am missing something here :)

bwa takes a lot of time .... a lot

Thank you

star hi-c • 479 views
ADD COMMENTlink modified 9 months ago by Gautier Richard280 • written 9 months ago by Dataminer2.6k

What error did you get? gunzip -c needs to be inside quotes (alternatively, you may use zcat). Also did you check the STAR manual for Chimeric alignment (Section 5) http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STAR.posix/doc/STARmanual.pdf

ADD REPLYlink written 9 months ago by Santosh Anand5.0k

Hi Santosh,

This is working fine no error, I will have a look at the link, thank :)

ADD REPLYlink written 9 months ago by Dataminer2.6k

BWA takes a long time because Hi-C datasets are large. What is your definition of "long", and how many CPUs did you use (full command line), what is your hardware? There are probably things one can optimize if you share your code.

ADD REPLYlink modified 9 months ago • written 9 months ago by ATpoint25k

Definition: 5 days of processing on HPC, with 12 threads and 132 Gb RAM. I understand that Hi-C data is huge and will take time, however 5 days is a lot :)

ADD REPLYlink written 9 months ago by Dataminer2.6k

How many reads in the dataset?

ADD REPLYlink written 9 months ago by ATpoint25k
1
gravatar for Gautier Richard
9 months ago by
MPI IE, Freiburg, Germany
Gautier Richard280 wrote:

For aligning Hi-C reads, you could try this command line with BWA (12 cores):

bwa mem -A1 -B4 -E50 -L0 -t 12 bwa_index.fa sequences_R1.fastq bwa mem -A1 -B4 -E50 -L0 -t 12 bwa_index.fa sequences_R2.fastq

This is what is used in the Snakepipes pipelines: https://github.com/maxplanck-ie/snakepipes/

Maybe your BWA alignment took longer than it should because of different parameters? Snakepipes is using 15 cores for that command and usually take less than a day to run. Even for deeply sequenced samples it took far less than 5 days.

With that command-line I don't see the point of using STAR for mapping Hi-C reads.

ADD COMMENTlink modified 9 months ago • written 9 months ago by Gautier Richard280
1

Hi, I was using exactly the same bwa mem -A1 -B4 -E50 -L0 -t 12 ref.fa file1.fastq Btw I have plant genome polyploidy

ADD REPLYlink written 9 months ago by Dataminer2.6k
2

For STAR I found this from the HIPPIE package for Hi-C data, perhaps it can help:

https://github.com/yihchii/hippie/blob/master/cmd/starMappingToBam.sh

So maybe you could grab some additional parameters from those? --outFilterMultimapNmax 1 and --alignIntronMax 1 you already had.

 --outFilterMultimapNmax 1 \
 --outFilterMismatchNovermax 0.04 \
 --scoreGapNoncan 0  --scoreGapGCAG 0  --scoreGapATAC 0 \
 --alignIntronMax 1 \
 --chimSegmentMin ${ChimSegMin} \
 --chimScoreJunctionNonGTAG 0 
ADD REPLYlink modified 9 months ago • written 9 months ago by Gautier Richard280

After further research I saw that STAR, when used with several CPUs, will scramble the reads order. Hi-C tools are usually waiting for properly ordered files in order to build matrices, I think for a correct pairing of the individual mapped files.

For that you can for example use ReorderSam from Picard. Of course it needs testing and it might depend on the Hi-C suite you use afterwards to build matrices (HiCExplorer is taking R1.bam and R2.bam as input of hicBuildMatrix for example, I believe that other suites might take paired bam files as input).

ADD REPLYlink modified 9 months ago • written 9 months ago by Gautier Richard280
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1005 users visited in the last hour