Question: RSeQC (infer_experiment) problem
0
gravatar for biolab
5.8 years ago by
biolab1.2k
biolab1.2k wrote:

HI everyone

I am using RSeQC to determine strandness.  When running the command infer_experiment.py  -r TAIR10.bed  -i WT_accepted_hits.bam.sort.bam,  I faced the following failure.  Could anyone help?  THANKS a lot.

Failure message:

This is PairEnd Data
Fraction of reads failed to determine: 1.0000
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0000
Fraction of reads explained by "1+-,1-+,2++,2--": 0.0000

 

head of TAIR10.bed

chr01    0    30427671    Chr1    .    .    TAIR10    chromosome    .ID=Chr1;Name=Chr1
chr01    3630    3759    AT1G01010.1-Protein    .    +    TAIR10    five_prime_UTR    .    Parent=AT1G01010.1
chr01    3630    3913    AT1G01010.1-Protein    .    +    TAIR10    exon    .Parent=AT1G01010.1

 

head of bam file

FC42T26AAXX:7:104:1142:897#0    73    chr01    3636    50    75M    *    00    ATTAGATATACCAAACCAGAGAAAACAAATACATAATCGGAGAAATACAGATTACAGAGAGCGAGAGAGATCGAC    abbbabbbbb`^abaaaababaaaaabababaabbaaabaaaaaaaaaaaaaaaaa`aaaaaaaaaaaaaa^a__    AS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:75    YT:Z:UU    NH:i:1

 

 

rseqc • 2.5k views
ADD COMMENTlink modified 5.8 years ago by Devon Ryan97k • written 5.8 years ago by biolab1.2k
0
gravatar for Devon Ryan
5.8 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

Note the chromosome names, they don't match.

ADD COMMENTlink written 5.8 years ago by Devon Ryan97k

HI Devon Thanks for your reply.  I have changed the bed file (all "Chr"s are converted to "chr0"s, please see below).   Unfortunately, the problem still exists. 

chr01    0    30427671    chr01    .    .    TAIR10    chromosome    .    ID=chr01;Name=chr01
chr01    3630    3759    AT1G01010.1-Protein    .    +    TAIR10    five_prime_UTR    .    Parent=AT1G01010.1
chr01    3630    3913    AT1G01010.1-Protein    .    +    TAIR10    exon    .    Parent=AT1G01010.1
chr01    3630    5899    AT1G01010    .    +    TAIR10    gene    .    ID=AT1G01010;Note=protein_coding_gene;Name=AT1G01010
chr01    3630    5899    AT1G01010.1    .    +    TAIR10    mRNA    .    ID=AT1G01010.1;Parent=AT1G01010;Name=AT1G01010.1;Index=1

 

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by biolab1.2k

The alignment you posted is of a singleton (i.e., one read from a pair aligned alone). Are they all like that? If so, then that's your problem. BTW, a simple say to check this is to just run samtools view foo.bam | cut -f 2 | sort | uniq -c.

ADD REPLYlink written 5.8 years ago by Devon Ryan97k

HI Devon, thanks a lot!  However, I still have the problem. samtools view bam file, then sort and uniq produce following output.   I think I probably need to check other sequencing datasets. 

8801117 chr01
4474771 chr02
5994228 chr03
4496609 chr04
6503270 chr05
ADD REPLYlink written 5.8 years ago by biolab1.2k

You must have used cut -f 3 instead of cut -f 2. We need to know the count of alignments for each flag value.

ADD REPLYlink written 5.8 years ago by Devon Ryan97k

The FLAGs are as follows.  Anything strange? THANKS.

5993 113
   6962 129
 133244 137
 141985 145
6450498 147
 139543 153
 142099 161
6434219 163
   5993 177
   6962 65
1789282 73
 142099 81
6434219 83
1844414 89
 141985 97
6450498 99

 

ADD REPLYlink written 5.8 years ago by biolab1.2k
1

So most of the alignments are informative. I haven't a clue then why the script is having problems. Just look at a few genes in IGV and you should be able to discern the strandedness.

ADD REPLYlink written 5.8 years ago by Devon Ryan97k

Hi Devon, Thanks a lot anyway!  I will manually check some genes after separating XS strands.

ADD REPLYlink written 5.8 years ago by biolab1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1332 users visited in the last hour