Thursday, October 23, 2008

Don't believe rrdtool

rrdtool is the industry standard for plotting time-dependent data (t.g., for monitoring). However, Debian's rrdtool 1.3.1-4 can create misleading plots. Here is how to reproduce.

First, create a round-robin database that will hold our test data.

rrdtool create testdata.rrd --start 1224453000 --step 1800 \
DS:testdata:GAUGE:28000:0:U RRA:LAST:0.5:1:1800

Then, populate it with numbers:

rrdtool update testdata.rrd 1224453300:1350535
rrdtool update testdata.rrd 1224467700:1350545
rrdtool update testdata.rrd 1224482100:1350554
rrdtool update testdata.rrd 1224496500:1350560
rrdtool update testdata.rrd 1224514800:1350562
rrdtool update testdata.rrd 1224539700:1350562
rrdtool update testdata.rrd 1224557700:1350562
rrdtool update testdata.rrd 1224576000:1350562
rrdtool update testdata.rrd 1224590100:1350562
rrdtool update testdata.rrd 1224604800:1350562
rrdtool update testdata.rrd 1224622500:1350562
rrdtool update testdata.rrd 1224636900:1350562
rrdtool update testdata.rrd 1224651300:1350562
rrdtool update testdata.rrd 1224669300:1350562
rrdtool update testdata.rrd 1224683700:1350562
rrdtool update testdata.rrd 1224698100:1350562
rrdtool update testdata.rrd 1224712500:1350562
rrdtool update testdata.rrd 1224730500:1350562
rrdtool update testdata.rrd 1224744900:1350562

The number before the semicolon is a UNIX timestamp, and the number after the semicolon is the corresponding value. As you see from the numbers, the value is slightly above 1.35 million and slowly grows in the beginning of the period.

Let's plot it:

rrdtool graph testdata.png -t testdata --start 1224453000 --end 1224757634 \
DEF:testdata=testdata.rrd:testdata:LAST 'LINE2:testdata#ff0000'

Good result

Indeed, there is not much change. But let's suppose that we are really interested in the small change that happened over the week.

Let's use the alternative autoscaling option that, according to the manual page, is designed specifically for such cases:

rrdtool graph testdata-bug.png -t testdata --start 1224453000 --end 1224757634 \
-A -Y DEF:testdata=testdata.rrd:testdata:LAST 'LINE2:testdata#ff0000'

bug?!
What? The plot says that the value is 35000M, which is waaaay wrong.

I don't know yet whether this bug is Debian-spedific, but that's enough for discouraging me from using -A and -Y options.

Update: not a bug. The label is just cut off from the left. Here is the correct plot:

rrdtool graph testdata-bug.png -t testdata --start 1224453000 --end 1224757634 \
-A -Y -L 10 DEF:testdata=testdata.rrd:testdata:LAST 'LINE2:testdata#ff0000'

non-bug

Although I must say that MS Excel handles this case better: it replaces numbers that don't fit with "###". Such placeholder (no number) is better than a wrong (chopped) number.

No comments: