Question: Why do these two reads cause segmentation fault in Cufflinks?
1
gravatar for murphy.charlesj
2.2 years ago by
United States
murphy.charlesj90 wrote:

I have 50bp PE RNA-seq, and 4 of my ~200 samples end in error after running Cufflinks (v.2.2.1). Here is my pipeline prior to running Cufflinks:

  1. STAR 2-pass (GRCm38)
  2. Added read group information with Picard
  3. Marked (but did not remove) duplicates with Picard
  4. SplitNCigarReads with GATK
  5. Realigned around indels with GATK
  6. Performed base recalibration with GATK

I've been looking at one particular sample with the error and have narrowed it down to a couple reads (see below):

PC168224:108:C90EBANXX:5:1303:19181:37569   179 6   85461128    60  43M345H =   85461508    338 CCCTCAGATCATCATCCGAGCTTTCCGCACAGCCACCCAATTG C46FE@3CDECDBDD>/C6FE?FD@:EA?5CDBB>33<::;<B MC:Z:380H4I4S   BD:Z:KOOOOMMLMMLMMLLMMLNNJFLLMNNNMOPPQSSQVSSKMMM    PG:Z:MarkDuplicates RG:Z:HL126  NH:i:1  BI:Z:JNOOSQOQNOOMNONJKKOMIBKNJNMOOPQLOPQNUSSNNMM    HI:i:1  nM:i:1  MQ:i:70 AS:i:97 XS:A:+

PC168224:108:C90EBANXX:5:1303:19181:37569   179 6   85461508    70  380H4I4S    =   85461128    -338    GGTGGCTG    AC?9=5EB    MC:Z:43M345H    OC:Z:380H4M4S   BD:Z:ONNOPPMM   PG:Z:MarkDuplicates RG:Z:HL126  NH:i:1  BI:Z:NORNPPMM   HI:i:1  nM:i:1  MQ:i:60 AS:i:97 XS:A:+

Now if I put those to reads into a separate BAM file (with original header from the sample) and running cufflinks I get the following:

cufflinks -o test. test.bam
You are using Cufflinks v2.2.1, which is the most recent release.
[14:04:40] Inspecting reads and determining fragment length distribution.
Segmentation fault

I've look at those two reads but am not sure why about them causes a segmentation fault. Any ideas?

rna-seq • 806 views
ADD COMMENTlink modified 2.2 years ago by Pierre Lindenbaum121k • written 2.2 years ago by murphy.charlesj90
3
gravatar for Pierre Lindenbaum
2.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

this is a bad cigar string:

380H4I4S

there is not "Match/M" in this cigar string: 380 hard clip, 4 insertions, 4 soft clip. Just 3 things that are not associated to a reference;

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Pierre Lindenbaum121k

Ah, that seems to be it. Replacing the cigar string with the following fixes the issue:

380H4I4M

Now to figure out why there are bad cigar strings in some of my samples...

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by murphy.charlesj90

i Think it's still wrong anyway because an alignment cannot start with an insertion.

ADD REPLYlink written 2.2 years ago by Pierre Lindenbaum121k

You're probably right. I was just testing it out to see if what you said would fix the issue. I' am going to see how systematic an issue these bad cigar strings are in my files, and if it very few reads then I'll just remove them.

ADD REPLYlink written 2.2 years ago by murphy.charlesj90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1577 users visited in the last hour