Question: STAR can not detect this chimeric read
3
gravatar for Nicolas Rosewick
3.0 years ago by
Belgium, Brussels
Nicolas Rosewick8.0k wrote:

Hi,

I try to detect chimeric fusions between a virus (integrated into the host genome) and the host genome. I aligned the reads (paired-end 2x76 stranded) on a hybrid genome (host+virus) where the virus genome is considered as an additional chromosome. Here's my command :

$STAR --genomeDir $stargenomeDir --outFilterScoreMinOverLread 0.3 --outFilterMatchNminOverLread 0.3 -seedSearchStartLmax 10 --outFilterMultimapNmax 10 --outFilterMismatchNmax 10 --chimSegmentMin 10 --outFilterMatchNmin 10 --chimJunctionOverhangMin 10 --readFilesIn $r1 $r2 --runThreadN $threads --outStd SAM --readFilesCommand zcat

version : STAR_2.3.1u_r375

So I expect STAR to report fusion reads with minimum 10 bases aligning either of the host or virus ; and the rest on the virus or the host respectivelly. As :

 # : host genome
 @ : virus genome
 = : read
 - : splicing

#######################################@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
          ==========-----------------------------------=========================
            min 10bp

But when I check on IGV on the extremity of virus genome I observe some reads containing soft-clipping longer than 10bp (in this cases they are 13bp long). When I align these soft-clipped bases on the host genome using blast I found a position on the host genome where I indeed found traces of a fusion transcript which continues (I can clearly see reads that aligned after the fusion breakpoint representing the fusion transcript). But STAR do not report this fusion. Am I doing something wrong ?

Here's the SAM lines for a pair of reads containing the soft-clipping of interest :

NS500186:93:HYFJVBGXX:3:13402:19874:18099   163 chrVirus    1   255 16S59M  =   65  128 CACATGTTTAGGTTTGTGACAATGACCATGAGCCCCAAATATCCCCCGGGGGCTTAGAGCCTCCCAGTGAAAAAC AAAAAEEEEEEEEEEEEEEEEEEEAEEEEEAEEEEEEEEEEEEEEE/EEEEEEEEEEEEAEEE<EEEEEEEEEEE NH:i:1  HI:i:1  AS:i:117    nM:i:2
NS500186:93:HYFJVBGXX:3:13402:19874:18099   83  chrVirus    65  255 64M12S  =   1   -128    CGCGAAACAGAAGTCTGAAAAGGTCAGGGCCCAGACCAAGGCTCTGACGTCTCCCCCCGGAGGGACAGCTCAGCAC    AAEEEEEEEEEEEEEAEEEEEEE/EEEEEEEAEEEAEEEEE/EEEEEEEAEEEEEEEEEEEEEEEEEAEAEAAAAA    NH:i:1  HI:i:1  AS:i:117    nM:i:2

I put two figures explaining my cases

Alignment on the virus :

enter image description here

Alignment on the host :

picture 2

Thanks

rna-seq star fusion • 1.7k views
ADD COMMENTlink modified 2.8 years ago by Biostar ♦♦ 20 • written 3.0 years ago by Nicolas Rosewick8.0k
2

I think I found the issue. The soft-clipped sequence appears to be present at multiple position in the host genome. I suppose STAR do not report multi-mapping fusions. I'll dig a little bit more to be sure.

ADD REPLYlink written 3.0 years ago by Nicolas Rosewick8.0k

@NicoBxl: Both of the links you included for the images are not loading (I tested them). Can you post new versions and update your post?

ADD REPLYlink written 3.0 years ago by genomax70k

Should work. The links are ok. I also put the SAM lines for a pair of reads.

ADD REPLYlink written 3.0 years ago by Nicolas Rosewick8.0k

Perhaps the links are working from your part of the world but they still don't work for me. Referring to this link for example.

ADD REPLYlink written 3.0 years ago by genomax70k

I reupload the picture and change the post. You see them now ?

ADD REPLYlink written 3.0 years ago by Nicolas Rosewick8.0k

I still can't (even if I use the URL) but don't worry. It may be local firewall specific. Other's able to see the images?

ADD REPLYlink written 3.0 years ago by genomax70k
1

I see images. Even when sober.

ADD REPLYlink written 3.0 years ago by WouterDeCoster40k

I see them even before coffee :P

@NicoBxl: You might post this to the STAR email list/google group. Alex is usually pretty good about replying (given how many options STAR has, I expect he's the only one that can point to the right one to tweak).

ADD REPLYlink written 3.0 years ago by Devon Ryan91k
1

Yes I will do that. I will post his answer here.

ADD REPLYlink written 3.0 years ago by Nicolas Rosewick8.0k

If you would like to try another tool I wrote RILseq which was built for chimera detection in bacteria but I don't see why it wouldn't work here. It overcomes multiple mapping reads and will do some statistical analysis to detect over-represented chimera. See https://pypi.python.org/pypi/RILseq

ADD REPLYlink written 2.8 years ago by Asaf6.1k

Did you try STAR-fusion?

ADD REPLYlink written 2.8 years ago by Ron970
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1172 users visited in the last hour