Picardtools Output of CollectRNAMetrices
1
0
Entering edit mode
6.4 years ago
Biocode_user ▴ 30

Hello,

I am using picard-tools CollectRNAMetrices to get the read statistics of mapping to exons, and UTRs etc. I am using

java -jar /software/shared/apps/x86_64/picard-tools/1.129/picard.jar  CollectRnaSeqMetrics REF_FLAT=formatted_chrALL.refflat2 STRAND_SPECIFICITY=NONE INPUT=out.prefix.bam OUTPUT=RNAMetrices.out

and the reflat file looks like

Ec-00_000010    Ec-00_000010    chr_00  -       149     6731    149     6731    10      149,897,1535,2091,2535,3474,4006,4702,6245,6709,        428,1100,1674,2268,3070,3557,4155,4968,6363,6731,
Ec-00_000020    Ec-00_000020    chr_00  -       28572   29122   28572   29122   2       28572,28937,    28582,29122,
Ec-00_000030    Ec-00_000030    chr_00  +       29412   32214   29412   32214   1       29412,  32214,
Ec-00_000040    Ec-00_000040    chr_00  +       34287   34360   34287   34360   1       34287,  34360,
Ec-00_000050    Ec-00_000050    chr_00  -       36705   39329   36705   37902   3       36705,37422,39143,      36870,37944,39329,
Ec-00_000060    Ec-00_000060    chr_00  +       43007   44099   43007   44099   3       43007,43404,43829,      43046,43455,44099,
Ec-00_000070    Ec-00_000070    chr_00  +       54394   60308   54394   60255   6       54394,54969,55586,56305,57343,59928,    54448,55047,55672,56473,57542,60308,
Ec-00_000080    Ec-00_000080    chr_00  -       109869  113579  110071  113077  8       109869,110666,110999,111347,111759,111995,112509,112990,        110443,110759,111151,111504,111818,112233,112557,113579,
Ec-00_000090    Ec-00_000090    chr_00  -       129160  133715  129160  133650  8       129160,129561,130334,131503,131773,132036,132595,132961,        129285,129703,130418,131581,131848,132102,132638,133715,
Ec-00_000100    Ec-00_000100    chr_00  +       144813  144981  144813  144981  1       144813, 144981,

But I keep getting an output like

## htsjdk.samtools.metrics.StringHeader
# picard.analysis.CollectRnaSeqMetrics REF_FLAT=formatted_chrALL.refflat2 STRAND_SPECIFICITY=NONE INPUT=out.prefix.bam OUTPUT=RNAMetrices.out    MINIMUM_LENGTH=500 RRNA_FRAGMENT_PERCENTAGE=0.8 METRIC_ACCUMULATION_LEVEL=[ALL_READS] ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
# Started on: Wed Feb 03 16:30:29 CET 2016

## METRICS CLASS        picard.analysis.RnaSeqMetrics
5159550200      4343544551              0       0       0       4343544551      0       0       0               0       0       0       1       0       0       0       0       0       0       0            

I am sure my reads are mapped to exons as well. Could anyone have any idea what could be going wrong?

RNA-Seq • 2.6k views
0
Entering edit mode

Please post the output of samtools view -H [your BAM file]

0
Entering edit mode

for that the output looks like

@HD VN:1.0 SO:coordinate
@SQ SN:Ec-00_000010 LN:1971
@SQ SN:Ec-00_000020 LN:195
@SQ SN:Ec-00_000030 LN:2802
@SQ SN:Ec-00_000050 LN:871
@SQ SN:Ec-00_000060 LN:360
@SQ SN:Ec-00_000070 LN:964
@SQ SN:Ec-00_000080 LN:1908
@SQ SN:Ec-00_000090 LN:1366
@SQ SN:Ec-00_000100 LN:168


And this is the same chromosome id in my refflat file too

0
Entering edit mode
6.4 years ago
Dan D 7.3k

The refFlat format specifies that the first two columns are for two types of gene names, while the third is for the chromosome/contig name. In your refFlat file, the contig names (chr_00, for example) don't match the contig names shown in the header of your BAM file. The entries in the first two columns of your refFlat file match them exactly, though.

You might have another problem in that the lengths of your contigs as shown in the header of the BAM file are shorter than the transcription start and stop locations in your refFlat file seem to indicate.

0
Entering edit mode

Hello Dan,

I just formatted the reflat file and the third column matches exactly as the contig name in bam file. But stil I get the same results. How will the length of the contigs screwing up the calculation? I did not understand your comment on that?

0
Entering edit mode

If you look at the output of the samtools view command I requested you to execute, you'll see several lines that start with @SQ. The LN: value at the end of each of those lines specifies the length of the contig in the reference file that was used for the creation of the BAM. Based on this, the longest contig in your reference is 2,802 bases.

Now check out your refFlat file. Look at columns 5 through 8. These are the coding start/end locations for the genes. Notice that these locations are well outside the length of you reference contigs. It doesn't seem like the refFlat file matches the reference you used.