1
00:00:10,670 --> 00:00:10,670
Well, I will have an interesting

2
00:00:10,670 --> 00:00:12,470
So, lets begin with the agenda

3
00:00:12,470 --> 00:00:12,670
I will talk a little bit about
So, lets begin with the agenda

4
00:00:12,670 --> 00:00:14,970
I will talk a little bit about

5
00:00:16,970 --> 00:00:18,970
the history,

6
00:00:18,970 --> 00:00:20,970
what makes Munin

7
00:00:20,970 --> 00:00:22,970
very unique,

8
00:00:22,970 --> 00:00:24,220
we'll see

9
00:00:24,220 --> 00:00:24,970
unique but not in a very good way
we'll see

10
00:00:24,970 --> 00:00:26,970
what makes it also really
unique but not in a very good way

11
00:00:26,970 --> 00:00:29,960
unique but not in a very good way

12
00:00:29,960 --> 00:00:31,960
what makes

13
00:00:31,960 --> 00:00:33,960
unique in

14
00:00:33,960 --> 00:00:35,960
2.0 version, what

15
00:00:35,960 --> 00:00:37,960
we got in Wheezy,

16
00:00:37,960 --> 00:00:39,960
it's very interesting.

17
00:00:42,930 --> 00:00:45,960
we will also see

18
00:00:47,960 --> 00:00:49,960
according to the new features

19
00:00:49,960 --> 00:00:51,960
of 2.0 you can scale

20
00:00:51,960 --> 00:00:53,960
much more a Munin

21
00:00:53,960 --> 00:00:55,960
install

22
00:00:55,960 --> 00:00:57,960
install from

23
00:00:57,960 --> 00:00:59,960
1.4 package

24
00:01:01,960 --> 00:01:03,960
and we will see also a limitation

25
00:01:03,960 --> 00:01:05,960
of 2.0 since

26
00:01:07,960 --> 00:01:09,960
now you can scale quite well

27
00:01:09,960 --> 00:01:11,960
I mean,

28
00:01:11,960 --> 00:01:13,960
theorically, you can scale really well,

29
00:01:13,960 --> 00:01:15,960
practically, well,

30
00:01:15,960 --> 00:01:17,960
we will see. We still have some

31
00:01:17,960 --> 00:01:19,960
big big issues

32
00:01:19,960 --> 00:01:21,960
very different from

33
00:01:21,960 --> 00:01:23,960
the ones in 1.4, but

34
00:01:23,960 --> 00:01:25,960
still, and I will

35
00:01:25,960 --> 00:01:27,960
present rapidly

36
00:01:27,960 --> 00:01:29,960
the roadmap for

37
00:01:29,960 --> 00:01:31,960
2.2.

38
00:01:31,960 --> 00:01:33,960
Hopefully, it's released

39
00:01:33,960 --> 00:01:35,960
this year. Well,

40
00:01:35,960 --> 00:01:37,960
that's a challenge.

41
00:01:39,960 --> 00:01:41,960
Thing is, if you have

42
00:01:41,960 --> 00:01:43,960
any questions, I will

43
00:01:43,960 --> 00:01:45,960
stop at 10:00, and

44
00:01:45,960 --> 00:01:47,960
we will have 15 minutes for

45
00:01:47,960 --> 00:01:49,960
questions just after.

46
00:01:51,960 --> 00:01:53,960
So, brief history,

47
00:01:55,220 --> 00:02:00,490
Munin was born in 2002, and was named RRD

48
00:02:01,930 --> 00:02:04,820
I didn't know that fact before

49
00:02:04,820 --> 00:02:06,820
and I just know it because I researched

50
00:02:06,820 --> 00:02:08,820
for the presentation

51
00:02:10,820 --> 00:02:12,820
and it's not a well known fact but

52
00:02:12,820 --> 00:02:14,820
some code

53
00:02:14,820 --> 00:02:16,820
I mean, most code

54
00:02:16,820 --> 00:02:18,820
still dates from

55
00:02:18,820 --> 00:02:20,820
that day. So it's quite

56
00:02:20,820 --> 00:02:22,820
important to see that

57
00:02:22,820 --> 00:02:24,820
issues

58
00:02:24,820 --> 00:02:26,820
when changing code

59
00:02:26,820 --> 00:02:28,820
it's more like geology.

60
00:02:28,820 --> 00:02:30,820
You have every

61
00:02:30,820 --> 00:02:32,820
every layers

62
00:02:32,820 --> 00:02:34,820
If you want to add a functionality

63
00:02:34,820 --> 00:02:36,820
one layer newer functionality

64
00:02:36,820 --> 00:02:38,820
one layer... Well,

65
00:02:38,820 --> 00:02:40,820
you all know this. [laughter]

66
00:02:40,820 --> 00:02:42,820
So, I hacked zooming

67
00:02:42,820 --> 00:02:44,820
for 1.2 in 2007

68
00:02:44,820 --> 00:02:46,820
I mean

69
00:02:46,820 --> 00:02:48,820
to 1.2

70
00:02:48,820 --> 00:02:50,820
was very static

71
00:02:50,820 --> 00:02:52,820
And well,

72
00:02:52,820 --> 00:02:54,820
I maintained it in my

73
00:02:54,820 --> 00:02:56,820
own private

74
00:02:56,820 --> 00:02:58,820
place

75
00:02:58,820 --> 00:03:00,820
And in 2009

76
00:03:00,820 --> 00:03:02,820
1.4

77
00:03:02,820 --> 00:03:04,820
came out, and I asked

78
00:03:04,820 --> 00:03:06,820
if I could send my

79
00:03:06,820 --> 00:03:08,820
patch to Munin

80
00:03:08,820 --> 00:03:10,820
and, well, they got

81
00:03:10,820 --> 00:03:12,820
accepted, and from

82
00:03:14,820 --> 00:03:16,820
2009 until

83
00:03:16,820 --> 00:03:18,820
2011, so, I was

84
00:03:18,820 --> 00:03:20,820
slowly

85
00:03:20,820 --> 00:03:22,820
gaining ground in the

86
00:03:22,820 --> 00:03:24,820
Munin community until now

87
00:03:24,820 --> 00:03:26,820
where, well,

88
00:03:26,820 --> 00:03:28,820
I've just

89
00:03:28,820 --> 00:03:30,820
took over the leadership

90
00:03:30,820 --> 00:03:32,820
from the previous team

91
00:03:32,820 --> 00:03:34,820
and it didn't

92
00:03:34,820 --> 00:03:36,820
happen officially, but just

93
00:03:36,820 --> 00:03:38,820
it's just the way it is.

94
00:03:40,820 --> 00:03:42,820
so in 2012 I released

95
00:03:42,820 --> 00:03:44,820
2.0

96
00:03:44,820 --> 00:03:46,820
thanks to Holger

97
00:03:46,820 --> 00:03:48,820
who said, "hey,

98
00:03:48,820 --> 00:03:50,820
you have to release now, otherwise

99
00:03:50,820 --> 00:03:52,820
you will release in ten years".

100
00:03:54,820 --> 00:03:56,820
So, thanks to him, things were

101
00:03:56,820 --> 00:03:58,820
very very

102
00:03:58,820 --> 00:04:00,820
hectic at the early days

103
00:04:00,820 --> 00:04:02,820
of 2.0

104
00:04:02,820 --> 00:04:04,820
because I realized that

105
00:04:04,820 --> 00:04:06,820
the biggest

106
00:04:06,820 --> 00:04:08,820
point was

107
00:04:08,820 --> 00:04:10,820
since it wasn't released

108
00:04:10,820 --> 00:04:12,820
we didn't have many testers

109
00:04:12,820 --> 00:04:14,820
and since we

110
00:04:14,820 --> 00:04:16,820
didn't have many testers, I didn't want to

111
00:04:16,820 --> 00:04:18,820
release it, since I still have some

112
00:04:18,820 --> 00:04:20,820
bugs that

113
00:04:20,820 --> 00:04:22,820
came out and so on.

114
00:04:22,820 --> 00:04:24,820
So, thanks to Holger we broke

115
00:04:24,820 --> 00:04:26,820
this cycle.

116
00:04:26,820 --> 00:04:28,820
And we released

117
00:04:28,820 --> 00:04:30,820
in 2012

118
00:04:30,820 --> 00:04:32,820
it's interesting since it's

119
00:04:32,820 --> 00:04:34,820
ten years since

120
00:04:34,820 --> 00:04:36,820
it's born

121
00:04:36,820 --> 00:04:38,820
and someone said that

122
00:04:38,820 --> 00:04:40,820
every software gets good

123
00:04:40,820 --> 00:04:42,820
after ten years, maybe,

124
00:04:42,820 --> 00:04:44,820
and it's in

125
00:04:44,820 --> 00:04:46,820
Wheezy since September

126
00:04:46,820 --> 00:04:48,820
2012

127
00:04:48,820 --> 00:04:50,820
and

128
00:04:50,820 --> 00:04:52,820
it's in Stable

129
00:04:52,820 --> 00:04:54,820
since Wheezy got out.

130
00:04:54,820 --> 00:04:56,820
So, in

131
00:04:56,820 --> 00:04:58,820
2013,

132
00:04:58,820 --> 00:05:00,820
I released

133
00:05:00,820 --> 00:05:02,820
2.1

134
00:05:02,820 --> 00:05:04,820
it's unstable

135
00:05:04,820 --> 00:05:06,820
a branch, because

136
00:05:08,820 --> 00:05:10,820
I didn't want to have the

137
00:05:10,820 --> 00:05:12,820
same problems as with

138
00:05:12,820 --> 00:05:14,820
2.0, its

139
00:05:14,820 --> 00:05:16,820
lack of testers, so I just packaged something,

140
00:05:16,820 --> 00:05:18,820
I packaged the development branch

141
00:05:18,820 --> 00:05:20,820
and released it. It's unstable.

142
00:05:20,820 --> 00:05:22,820
Normally it works

143
00:05:22,820 --> 00:05:24,820
but well, it's unstable.

144
00:05:24,820 --> 00:05:26,820
You know what

145
00:05:26,820 --> 00:05:28,820
unstable means.

146
00:05:28,820 --> 00:05:30,820
And the biggest thing

147
00:05:30,820 --> 00:05:32,820
is the Internet will

148
00:05:32,820 --> 00:05:34,820
change in

149
00:05:34,820 --> 00:05:36,820
...in the

150
00:05:36,820 --> 00:05:38,820
2.1 lifeline

151
00:05:40,820 --> 00:05:42,820
And I said, October

152
00:05:42,820 --> 00:05:44,820
2013 is target for

153
00:05:44,820 --> 00:05:46,820
release 2.2

154
00:05:46,820 --> 00:05:48,820
But time will tell.

155
00:05:50,820 --> 00:05:52,820
I mean, if you don't fix timelines and deadlines

156
00:05:52,820 --> 00:05:54,820
you will never release things.

157
00:05:54,820 --> 00:05:56,820
So, better be late than never.

158
00:05:58,820 --> 00:06:00,820
Ok, so

159
00:06:00,820 --> 00:06:02,820
a very simple

160
00:06:02,820 --> 00:06:04,820
design principle of Munin is

161
00:06:06,820 --> 00:06:08,820
I really love this quote from

162
00:06:08,820 --> 00:06:10,820
Allan Kay, it's

163
00:06:10,820 --> 00:06:12,820
"Simple things should be simple,

164
00:06:12,820 --> 00:06:14,820
complex things should be possible."

165
00:06:14,820 --> 00:06:16,820
That's exactly the motto of Munin.

166
00:06:16,820 --> 00:06:18,820
Munin makes

167
00:06:18,820 --> 00:06:20,620
simple things simple, and

168
00:06:20,620 --> 00:06:20,820
complex things
simple things simple, and

169
00:06:20,820 --> 00:06:22,820
complex things

170
00:06:22,820 --> 00:06:24,820
possible

171
00:06:26,820 --> 00:06:28,820
It's very easy to use, it has

172
00:06:28,820 --> 00:06:30,820
the same out-of-the-box

173
00:06:30,820 --> 00:06:32,820
behavior, because when you

174
00:06:32,820 --> 00:06:34,820
install it on a server, it

175
00:06:34,820 --> 00:06:36,820
automatically starts monitoring.

176
00:06:36,820 --> 00:06:38,820
And if not, it's a bug.

177
00:06:38,820 --> 00:06:40,820
Please report it.

178
00:06:40,820 --> 00:06:42,820
And, it

179
00:06:42,820 --> 00:06:44,820
has a complete plug-and-play infrastructure

180
00:06:44,820 --> 00:06:46,820
compared to

181
00:06:46,820 --> 00:06:48,820
others, you...

182
00:06:48,820 --> 00:06:50,820
The only thing you need

183
00:06:50,820 --> 00:06:52,820
is to declare

184
00:06:52,820 --> 00:06:54,820
the node, because, well,

185
00:06:54,820 --> 00:06:56,820
broadcasting on

186
00:06:56,820 --> 00:06:58,820
the local network

187
00:06:58,820 --> 00:07:00,820
is not very practical

188
00:07:00,820 --> 00:07:02,820
in my point of view, so it's the only thing

189
00:07:02,820 --> 00:07:04,820
you need to say.

190
00:07:04,820 --> 00:07:06,820
You have to poll this node, and

191
00:07:06,820 --> 00:07:08,820
the node will just

192
00:07:08,820 --> 00:07:10,820
hand all the config

193
00:07:10,820 --> 00:07:12,820
to the master

194
00:07:12,820 --> 00:07:14,820
so th graphs

195
00:07:14,820 --> 00:07:16,820
are drawing.

196
00:07:16,820 --> 00:07:18,820
Thing is,

197
00:07:18,820 --> 00:07:20,820
our user

198
00:07:20,820 --> 00:07:22,820
The vast majority of

199
00:07:22,820 --> 00:07:24,820
our users, just has one server

200
00:07:24,820 --> 00:07:26,820
to monitor, and

201
00:07:26,820 --> 00:07:28,820
it's the same that

202
00:07:28,820 --> 00:07:30,820
the Munin install is on it.

203
00:07:30,820 --> 00:07:32,820
I mean, that's

204
00:07:32,820 --> 00:07:34,820
that's why

205
00:07:34,820 --> 00:07:36,820
the default Munin

206
00:07:36,820 --> 00:07:38,820
is always targetted at this user.

207
00:07:38,820 --> 00:07:40,820
But if you have a bigger install

208
00:07:40,820 --> 00:07:42,820
Well, you already know

209
00:07:42,820 --> 00:07:44,820
how to change

210
00:07:44,820 --> 00:07:46,820
config files, usually.

211
00:07:48,820 --> 00:07:50,820
And, as I said,

212
00:07:50,820 --> 00:07:52,820
some are running bigger

213
00:07:52,820 --> 00:07:54,820
installs, and

214
00:07:54,820 --> 00:07:56,820
this are

215
00:07:56,820 --> 00:07:58,820
the ones that interest me

216
00:07:58,820 --> 00:08:00,820
very much in 2.2, since

217
00:08:00,820 --> 00:08:02,820
well, we do

218
00:08:02,820 --> 00:08:04,820
address really well the

219
00:08:04,820 --> 00:08:06,820
one-node install type

220
00:08:06,820 --> 00:08:08,820
but

221
00:08:08,820 --> 00:08:10,820
for bigger installs

222
00:08:10,820 --> 00:08:12,820
we

223
00:08:12,820 --> 00:08:14,820
we have very much problems.

224
00:08:14,820 --> 00:08:16,820
We improved

225
00:08:16,820 --> 00:08:18,820
very much from

226
00:08:18,820 --> 00:08:20,820
1.4 to

227
00:08:20,820 --> 00:08:22,820
2.0, but now we

228
00:08:22,820 --> 00:08:24,820
hit other limits that we will discuss

229
00:08:24,820 --> 00:08:26,820
just after.

230
00:08:28,820 --> 00:08:30,820
Ok, so, new features:

231
00:08:32,820 --> 00:08:34,820
We really have now full

232
00:08:34,820 --> 00:08:36,820
CGI implementation!

233
00:08:36,820 --> 00:08:38,820
I mean, the one in

234
00:08:38,820 --> 00:08:40,820
1.5 you should not

235
00:08:40,820 --> 00:08:42,820
use it, I mean, it works,

236
00:08:42,820 --> 00:08:44,820
sometimes, and it's

237
00:08:44,820 --> 00:08:46,820
bugged every time.

238
00:08:46,820 --> 00:08:48,820
And, so

239
00:08:48,820 --> 00:08:50,820
it has also a full FastCGI

240
00:08:50,820 --> 00:08:52,820
implementation

241
00:08:52,820 --> 00:08:54,820
This is very important to

242
00:08:54,820 --> 00:08:56,820
have adequate performance

243
00:08:56,820 --> 00:08:58,820
so you don't reload everything

244
00:09:00,820 --> 00:09:02,820
The biggest selling point

245
00:09:02,820 --> 00:09:04,820
is, it has complete integration

246
00:09:04,820 --> 00:09:06,820
with RRDcacheD

247
00:09:06,820 --> 00:09:08,820
We will talk more about this

248
00:09:08,820 --> 00:09:10,820
later, but this

249
00:09:10,820 --> 00:09:12,820
is *the*

250
00:09:12,820 --> 00:09:14,820
main issue

251
00:09:14,820 --> 00:09:16,820
when scaling.

252
00:09:16,820 --> 00:09:18,820
Because our RRD is

253
00:09:18,820 --> 00:09:20,820
very nice, but doesn't scale

254
00:09:20,820 --> 00:09:22,820
very well in its native

255
00:09:22,820 --> 00:09:24,820
without RRDcacheD.

256
00:09:24,820 --> 00:09:26,820
Thing is, when you use

257
00:09:26,820 --> 00:09:28,820
RRDcacheD with some guidelines

258
00:09:28,820 --> 00:09:30,820
that I will describe

259
00:09:30,820 --> 00:09:32,820
later, and

260
00:09:32,820 --> 00:09:34,820
you should not do

261
00:09:34,820 --> 00:09:36,820
what you...

262
00:09:36,820 --> 00:09:38,820
It has native ssh transport

263
00:09:40,820 --> 00:09:42,820
Well, it's

264
00:09:42,820 --> 00:09:44,820
before you used

265
00:09:44,820 --> 00:09:46,820
plain TCP for

266
00:09:46,820 --> 00:09:48,820
the connection

267
00:09:48,820 --> 00:09:50,820
port 4949

268
00:09:50,820 --> 00:09:52,820
You could use TLS

269
00:09:52,820 --> 00:09:54,820
But most people didn't.

270
00:09:56,820 --> 00:09:58,820
And, with native ssh,

271
00:09:58,820 --> 00:10:00,820
usually people also

272
00:10:00,820 --> 00:10:02,820
already use ssh on their

273
00:10:02,820 --> 00:10:04,820
installs, so setting

274
00:10:06,820 --> 00:10:08,820
ssh transport for them is quite easy

275
00:10:08,820 --> 00:10:10,820
whereas having a TLS

276
00:10:10,820 --> 00:10:12,820
thing is, you have to have a

277
00:10:12,820 --> 00:10:14,820
certificate and so on and so on, it's...

278
00:10:14,820 --> 00:10:16,820
Quite much more complicated.

279
00:10:18,820 --> 00:10:20,820
And it's another open new port, as I said,

280
00:10:20,820 --> 00:10:22,820
and it's secure

281
00:10:22,820 --> 00:10:24,820
it's usually more integrated

282
00:10:24,820 --> 00:10:26,820
than

283
00:10:28,820 --> 00:10:30,820
in setups.

284
00:10:30,820 --> 00:10:32,820
The other big feature is

285
00:10:32,820 --> 00:10:34,820
async proxy. It's something

286
00:10:34,820 --> 00:10:36,820
that sits on

287
00:10:36,820 --> 00:10:38,820
the node,

288
00:10:38,820 --> 00:10:40,820
that holds the node autonomously,

289
00:10:40,820 --> 00:10:42,820
and stores

290
00:10:42,820 --> 00:10:44,820
locally on the node,

291
00:10:44,820 --> 00:10:46,820
the

292
00:10:48,820 --> 00:10:50,820
Munin update

293
00:10:50,820 --> 00:10:52,820
will then connect to the

294
00:10:52,820 --> 00:10:54,820
async client part

295
00:10:54,820 --> 00:10:56,820
and just repllay the spool that he

296
00:10:56,820 --> 00:10:58,820
spooled just

297
00:10:58,820 --> 00:11:00,820
before. So it has very

298
00:11:00,820 --> 00:11:02,820
interesting

299
00:11:02,820 --> 00:11:04,820
features. If you have

300
00:11:04,820 --> 00:11:06,820
some nodes that have

301
00:11:06,820 --> 00:11:08,820
loose connection, for example, you

302
00:11:08,820 --> 00:11:10,820
want to monitor a remote

303
00:11:10,820 --> 00:11:12,820
location that

304
00:11:12,820 --> 00:11:14,820
has sometimes

305
00:11:14,820 --> 00:11:16,820
no network or whatever

306
00:11:16,820 --> 00:11:18,820
since it's locally spooling

307
00:11:18,820 --> 00:11:20,820
when you connect you will

308
00:11:20,820 --> 00:11:22,820
recover everything

309
00:11:22,820 --> 00:11:24,820
that was

310
00:11:24,820 --> 00:11:26,820
collected meanwhile you didn't

311
00:11:26,820 --> 00:11:28,820
connect.

312
00:11:30,820 --> 00:11:32,820
So, those little white bars

313
00:11:32,820 --> 00:11:34,820
you were accustomed to are

314
00:11:34,820 --> 00:11:36,820
gone.

315
00:11:36,820 --> 00:11:38,820
It also speeds up poll.

316
00:11:38,820 --> 00:11:40,820
Even for a local network

317
00:11:40,820 --> 00:11:42,820
since

318
00:11:42,820 --> 00:11:44,820
it delegates

319
00:11:44,820 --> 00:11:46,820
all the polling and the waiting

320
00:11:46,820 --> 00:11:48,820
for plugins

321
00:11:48,820 --> 00:11:50,820
to the async proxy

322
00:11:50,820 --> 00:11:52,820
the data collection

323
00:11:52,820 --> 00:11:54,820
the Munin update

324
00:11:54,820 --> 00:11:56,820
goes *really* faster

325
00:11:56,820 --> 00:11:58,820
It only replays

326
00:11:58,820 --> 00:12:00,820
a logged text file

327
00:12:00,820 --> 00:12:02,820
So, the thing is that,

328
00:12:02,820 --> 00:12:04,820
when you have a big cluster

329
00:12:04,820 --> 00:12:06,820
it sometimes makes sense to

330
00:12:06,820 --> 00:12:08,820
use async, since

331
00:12:08,820 --> 00:12:10,820
well

332
00:12:12,820 --> 00:12:14,820
the fixed five minutes

333
00:12:14,820 --> 00:12:16,820
for Munin updates is

334
00:12:16,820 --> 00:12:18,820
still a hard one, and

335
00:12:18,820 --> 00:12:20,820
you cannot go further.

336
00:12:22,820 --> 00:12:24,820
And one

337
00:12:24,820 --> 00:12:26,820
last known thing about async proxy

338
00:12:26,820 --> 00:12:28,820
is, it can

339
00:12:28,820 --> 00:12:30,820
poll at various

340
00:12:30,820 --> 00:12:32,820
update rates.

341
00:12:32,820 --> 00:12:34,820
If you have one plugin

342
00:12:34,820 --> 00:12:36,820
that says, "I want to be polled

343
00:12:36,820 --> 00:12:38,820
every one hour",

344
00:12:38,820 --> 00:12:40,820
async will only poll it

345
00:12:40,820 --> 00:12:42,820
at one hour,

346
00:12:42,820 --> 00:12:44,820
and the most interesting part,

347
00:12:44,820 --> 00:12:46,620
if you have a plugin that says

348
00:12:46,620 --> 00:12:46,820
"I want to be polled every ten seconds"
if you have a plugin that says

349
00:12:46,820 --> 00:12:49,120
"I want to be polled every ten seconds"

350
00:12:49,120 --> 00:12:51,120
it will poll it every ten seconds, and still

351
00:12:51,120 --> 00:12:53,120
send every

352
00:12:53,120 --> 00:12:55,120
five minutes all of the data back

353
00:12:55,120 --> 00:12:57,120
to the Munin update.

354
00:12:57,120 --> 00:12:59,120
So, you won't have real-time

355
00:12:59,120 --> 00:13:01,120
information

356
00:13:01,120 --> 00:13:03,120
but you have very precise

357
00:13:03,120 --> 00:13:05,120
information.

358
00:13:09,120 --> 00:13:11,120
So now, we go

359
00:13:11,120 --> 00:13:13,120
to scalability

360
00:13:13,120 --> 00:13:15,120
That's the

361
00:13:15,120 --> 00:13:17,120
biggest focus

362
00:13:17,120 --> 00:13:19,120
on 2.0

363
00:13:19,120 --> 00:13:21,120
because the first one was to

364
00:13:21,120 --> 00:13:23,120
the zooming part

365
00:13:23,120 --> 00:13:25,120
and zooming just

366
00:13:25,120 --> 00:13:27,120
showed

367
00:13:27,120 --> 00:13:29,120
that, well,

368
00:13:29,120 --> 00:13:31,120
you can have huge data files

369
00:13:31,120 --> 00:13:33,120
since

370
00:13:33,120 --> 00:13:35,120
it doesn't... It's not very useful

371
00:13:35,120 --> 00:13:37,120
to zoom on one year

372
00:13:37,120 --> 00:13:39,120
history, if you

373
00:13:39,120 --> 00:13:41,120
don't keep the

374
00:13:41,120 --> 00:13:43,120
finer granularity in RRD

375
00:13:43,120 --> 00:13:45,120
one year back

376
00:13:45,120 --> 00:13:47,120
So

377
00:13:47,120 --> 00:13:49,120
that will be scaling

378
00:13:49,120 --> 00:13:51,120
data at the end. Really, what

379
00:13:51,120 --> 00:13:53,120
people want is adding more nodes

380
00:13:53,120 --> 00:13:55,120
That's the most common

381
00:13:55,120 --> 00:13:57,120
scaling issue

382
00:13:57,120 --> 00:13:59,120
that we get.

383
00:13:59,120 --> 00:14:01,120
Inside the node, you can

384
00:14:01,120 --> 00:14:03,120
also have a

385
00:14:03,120 --> 00:14:05,120
huge number of plugins

386
00:14:05,120 --> 00:14:07,120
some have very very large installations,

387
00:14:07,120 --> 00:14:09,120
specially when you start to

388
00:14:09,120 --> 00:14:11,120
use SNMP

389
00:14:11,120 --> 00:14:13,120
because SNMP is done

390
00:14:13,120 --> 00:14:15,120
by one host

391
00:14:15,120 --> 00:14:17,120
to monitor many many

392
00:14:17,120 --> 00:14:19,120
remote

393
00:14:19,120 --> 00:14:21,120
routers, or

394
00:14:21,120 --> 00:14:23,120
SNMP agents

395
00:14:23,120 --> 00:14:25,120
and the thing is, some

396
00:14:25,120 --> 00:14:27,120
also have slow plugins.

397
00:14:29,120 --> 00:14:31,120
We already

398
00:14:31,120 --> 00:14:33,120
discussed about

399
00:14:33,120 --> 00:14:35,120
Munin update should take less than

400
00:14:35,120 --> 00:14:37,120
five minutes, otherwise, well,

401
00:14:37,120 --> 00:14:39,120
bad things happen

402
00:14:39,120 --> 00:14:41,120
that's one point

403
00:14:41,120 --> 00:14:43,120
and that's a hard rule

404
00:14:43,120 --> 00:14:45,120
If your Munin updates

405
00:14:45,120 --> 00:14:47,120
takes more than five minutes

406
00:14:47,120 --> 00:14:49,120
really bad things happen

407
00:14:49,120 --> 00:14:51,120
mostly white bars.

408
00:14:51,120 --> 00:14:53,120
And so, if you have

409
00:14:53,120 --> 00:14:55,120
many many plugins, and

410
00:14:55,120 --> 00:14:57,120
many many plugins take

411
00:14:57,120 --> 00:14:59,120
quite a long time to

412
00:14:59,120 --> 00:15:01,120
poll, since it's all synchronous,

413
00:15:01,120 --> 00:15:03,120
the fact is,

414
00:15:03,120 --> 00:15:05,120
well,

415
00:15:05,120 --> 00:15:07,120
it can go, even if you

416
00:15:07,120 --> 00:15:09,120
parallelize

417
00:15:09,120 --> 00:15:11,120
very much, it sometimes

418
00:15:11,120 --> 00:15:13,120
still goes

419
00:15:13,120 --> 00:15:15,120
quite slowly, and

420
00:15:15,120 --> 00:15:17,120
if you

421
00:15:17,120 --> 00:15:19,120
multiply the number of plugins

422
00:15:19,120 --> 00:15:21,120
with long response times,

423
00:15:21,120 --> 00:15:23,120
many nodes,

424
00:15:23,120 --> 00:15:25,120
you have, you usually

425
00:15:25,120 --> 00:15:27,120
pass the five minutes bar.

426
00:15:27,120 --> 00:15:29,120
and it's scaling

427
00:15:29,120 --> 00:15:31,120
data usually what's

428
00:15:31,120 --> 00:15:33,120
with the zooming part

429
00:15:33,120 --> 00:15:35,120
usually many many people

430
00:15:35,120 --> 00:15:37,120
ask for, "well,

431
00:15:37,120 --> 00:15:39,120
I can zoom one year ago

432
00:15:39,120 --> 00:15:41,120
but now all I have

433
00:15:41,120 --> 00:15:43,120
is one bar per day", I mean,

434
00:15:43,120 --> 00:15:45,120
I don't care about the average

435
00:15:45,120 --> 00:15:47,120
for one day

436
00:15:47,120 --> 00:15:49,120
So here is

437
00:15:49,120 --> 00:15:51,120
you can natively have

438
00:15:51,120 --> 00:15:53,120
much

439
00:15:53,120 --> 00:15:55,120
more data inside

440
00:15:55,120 --> 00:15:57,120
we will see more about it later.

441
00:15:57,120 --> 00:15:59,120
So, scaling the master

442
00:15:59,120 --> 00:16:01,120
To have

443
00:16:01,120 --> 00:16:03,120
a big install

444
00:16:03,120 --> 00:16:05,120
the first thing is

445
00:16:05,120 --> 00:16:07,120
use FastCGI.

446
00:16:07,120 --> 00:16:09,120
Default is cron-based, remember

447
00:16:09,120 --> 00:16:11,120
default is for the typical

448
00:16:11,120 --> 00:16:13,120
user that has only one node and

449
00:16:13,120 --> 00:16:15,120
one server,

450
00:16:15,120 --> 00:16:17,120
any one that has

451
00:16:17,120 --> 00:16:19,120
more than, lets say, five

452
00:16:19,120 --> 00:16:21,120
nodes should really go

453
00:16:21,120 --> 00:16:23,120
the CGI road

454
00:16:23,120 --> 00:16:25,120
and not really CGI, but FastCGI

455
00:16:25,120 --> 00:16:27,120
Because

456
00:16:29,120 --> 00:16:31,120
The cron road

457
00:16:31,120 --> 00:16:33,120
is, you generate every

458
00:16:33,120 --> 00:16:35,120
kind of graphic, and it's

459
00:16:35,120 --> 00:16:37,120
just pointless. I mean,

460
00:16:37,120 --> 00:16:39,120
it's very simple, but it's pointless.

461
00:16:41,120 --> 00:16:43,120
As I said,

462
00:16:43,120 --> 00:16:45,120
You have to use

463
00:16:45,120 --> 00:16:47,120
RRDcacheD, because, the thing is

464
00:16:47,120 --> 00:16:49,120
RRD

465
00:16:49,120 --> 00:16:51,120
is very, very nice, it's a

466
00:16:51,120 --> 00:16:53,120
very nice piece of software, but

467
00:16:53,120 --> 00:16:55,120
it has only one main problem

468
00:16:55,120 --> 00:16:57,120
it is, it's so efficient

469
00:16:57,120 --> 00:16:59,120
that it

470
00:16:59,120 --> 00:17:01,120
writes only

471
00:17:01,120 --> 00:17:03,120
the very little

472
00:17:03,120 --> 00:17:05,119
part of the file

473
00:17:05,119 --> 00:17:07,119
and to the underlying

474
00:17:07,119 --> 00:17:09,119
I/O subsystem

475
00:17:11,119 --> 00:17:13,119
RRD updates, when you have a big one,

476
00:17:13,119 --> 00:17:15,119
it feels just like

477
00:17:15,119 --> 00:17:17,119
random I/O. And when I say

478
00:17:17,119 --> 00:17:19,119
random I/O, it's real random I/O.

479
00:17:19,119 --> 00:17:21,119
I mean,

480
00:17:21,119 --> 00:17:23,119
almost cryptographically

481
00:17:23,119 --> 00:17:25,119
secure.

482
00:17:25,119 --> 00:17:27,119
When I ask about

483
00:17:27,119 --> 00:17:29,120
some

484
00:17:29,120 --> 00:17:31,120
storage vendor,

485
00:17:31,120 --> 00:17:33,120
he says, "random I/O, we can do that!"

486
00:17:33,120 --> 00:17:35,120
I plugged Munin

487
00:17:35,120 --> 00:17:37,120
with a big install on it

488
00:17:37,120 --> 00:17:39,120
and he said, "what's that?"

489
00:17:39,120 --> 00:17:41,120
Yeah, random I/O

490
00:17:41,120 --> 00:17:43,120
"Well, not that random usually!"

491
00:17:45,120 --> 00:17:47,120
The people of

492
00:17:47,120 --> 00:17:49,120
RRD are well well aware of it

493
00:17:49,120 --> 00:17:51,120
and even designed

494
00:17:51,120 --> 00:17:53,120
RRDcacheD, that is specially

495
00:17:53,120 --> 00:17:55,120
designed to make this

496
00:17:55,120 --> 00:17:57,120
random I/O buffered

497
00:17:57,120 --> 00:17:59,120
and to make it like normal

498
00:17:59,120 --> 00:18:01,120
random I/O

499
00:18:01,120 --> 00:18:03,120
And it's called,

500
00:18:03,120 --> 00:18:05,120
there is a slide

501
00:18:05,120 --> 00:18:07,120
you can Google it,

502
00:18:07,120 --> 00:18:09,120
"RRDcacheD:

503
00:18:09,120 --> 00:18:11,120
to escape the I/O hell"

504
00:18:11,120 --> 00:18:13,120
it's really well described, and

505
00:18:13,120 --> 00:18:15,120
to understand

506
00:18:15,120 --> 00:18:17,120
what's behind

507
00:18:17,120 --> 00:18:19,120
RRDcacheD

508
00:18:19,120 --> 00:18:21,120
And it even works

509
00:18:21,120 --> 00:18:23,120
on SSD, because usually

510
00:18:23,120 --> 00:18:25,120
random I/O

511
00:18:25,120 --> 00:18:27,120
"OK, no problem, just use SSD"

512
00:18:27,120 --> 00:18:29,120
fast storage vendors

513
00:18:29,120 --> 00:18:31,120
said, "well, no problem, we just put an SSD"

514
00:18:31,120 --> 00:18:33,120
The thing is,

515
00:18:33,120 --> 00:18:35,120
after

516
00:18:35,120 --> 00:18:37,120
in my test, after four hours

517
00:18:37,120 --> 00:18:39,120
with, yes, big installs,

518
00:18:39,120 --> 00:18:41,120
all of the SSDs

519
00:18:41,120 --> 00:18:43,120
were just

520
00:18:43,120 --> 00:18:45,120
offline, because

521
00:18:45,120 --> 00:18:47,120
too many I/Os

522
00:18:47,120 --> 00:18:49,120
Because it writes, writes, writes, writes a lot.

523
00:18:53,120 --> 00:18:55,120
So, SSD is

524
00:18:55,120 --> 00:18:57,120
interesting, but not

525
00:18:57,120 --> 00:18:59,120
only for us.

526
00:18:59,120 --> 00:19:01,120
Thing is,

527
00:19:01,120 --> 00:19:03,120
RRDcacheD

528
00:19:03,120 --> 00:19:05,120
has only one

529
00:19:05,120 --> 00:19:07,120
very big drawback, which is, you should never

530
00:19:07,120 --> 00:19:09,120
ever read

531
00:19:09,120 --> 00:19:11,120
from the RRD file

532
00:19:11,120 --> 00:19:13,120
specially in cron, because

533
00:19:13,120 --> 00:19:15,120
if you read on demand, it's perfect

534
00:19:15,120 --> 00:19:17,120
it's only flushed

535
00:19:17,120 --> 00:19:19,120
the file you are reading

536
00:19:19,120 --> 00:19:21,120
but if you read it

537
00:19:21,120 --> 00:19:23,120
in cron, by default,

538
00:19:23,120 --> 00:19:25,120
you will read

539
00:19:25,120 --> 00:19:27,120
the whoooole install

540
00:19:27,120 --> 00:19:29,120
and that's exactly the same as

541
00:19:29,120 --> 00:19:31,120
not using RRDcacheD, so...

542
00:19:31,120 --> 00:19:33,120
So use it!

543
00:19:33,120 --> 00:19:35,120
Thing is, for Munin,

544
00:19:35,120 --> 00:19:37,120
you need lots of RAM.

545
00:19:37,120 --> 00:19:39,120
Because, as I said,

546
00:19:39,120 --> 00:19:41,120
we have RRDcacheD

547
00:19:43,120 --> 00:19:45,120
but the more RAM

548
00:19:45,120 --> 00:19:47,120
you put at RRDcacheD,

549
00:19:47,120 --> 00:19:49,120
the longer you can keep

550
00:19:49,120 --> 00:19:51,120
the spool

551
00:19:51,120 --> 00:19:53,120
and so, the thing is,

552
00:19:53,120 --> 00:19:55,120
it can

553
00:19:55,120 --> 00:19:57,120
it writes very less often

554
00:19:57,120 --> 00:19:59,120
and that's

555
00:19:59,120 --> 00:20:01,120
very interesting.

556
00:20:01,120 --> 00:20:03,120
If you have lots of RAM, you can

557
00:20:03,120 --> 00:20:05,120
multiply the number of workers.

558
00:20:05,120 --> 00:20:07,120
It means, you

559
00:20:07,120 --> 00:20:09,120
obviously, if you have...

560
00:20:09,120 --> 00:20:11,120
Since Munin is very very

561
00:20:11,120 --> 00:20:13,120
very much I/O bound

562
00:20:13,120 --> 00:20:15,120
either, so for waiting for NAS

563
00:20:15,120 --> 00:20:17,120
or for waiting for the I/O subsystem

564
00:20:17,120 --> 00:20:19,120
If you have

565
00:20:19,120 --> 00:20:21,120
many workers, usually it

566
00:20:21,120 --> 00:20:23,120
it helps

567
00:20:23,120 --> 00:20:25,120
a lot, because every worker is

568
00:20:25,120 --> 00:20:27,120
single-threaded.

569
00:20:29,120 --> 00:20:31,120
Thing is,

570
00:20:31,120 --> 00:20:33,120
but, do never

571
00:20:33,120 --> 00:20:35,120
ever swap.

572
00:20:35,120 --> 00:20:37,120
That's obvious. The thing is,

573
00:20:39,120 --> 00:20:41,120
Munin is designed to

574
00:20:41,120 --> 00:20:43,120
use all the memory

575
00:20:43,120 --> 00:20:45,120
of its workers. So if

576
00:20:45,120 --> 00:20:47,120
you only swap a little thing,

577
00:20:47,120 --> 00:20:49,120
then there is no, how to say,

578
00:20:49,120 --> 00:20:51,120
there is no lost memory.

579
00:20:51,120 --> 00:20:53,120
You cannot swap...

580
00:20:53,120 --> 00:20:55,120
For people that know the "swappiness" setting

581
00:20:55,120 --> 00:20:57,120
it means swapping

582
00:20:57,120 --> 00:20:59,120
before trading

583
00:20:59,120 --> 00:21:01,120
some application memory

584
00:21:01,120 --> 00:21:03,120
to file

585
00:21:03,120 --> 00:21:05,120
cache memory

586
00:21:05,120 --> 00:21:07,120
that's not a good idea

587
00:21:07,120 --> 00:21:09,120
The application memory

588
00:21:09,120 --> 00:21:11,120
is useful

589
00:21:11,120 --> 00:21:13,120
at one point.

590
00:21:15,120 --> 00:21:17,120
[... oops... That's OK ...]

591
00:21:21,120 --> 00:21:23,120
On the master, you have really

592
00:21:23,120 --> 00:21:25,120
to watch out for

593
00:21:25,120 --> 00:21:27,120
shared hardware.

594
00:21:27,120 --> 00:21:29,120
Because Munin is

595
00:21:29,120 --> 00:21:31,120
very nice

596
00:21:31,120 --> 00:21:33,120
and it loves

597
00:21:33,120 --> 00:21:35,120
to annihilate any hardware

598
00:21:35,120 --> 00:21:37,120
you put on it, because

599
00:21:37,120 --> 00:21:39,120
...well...

600
00:21:39,120 --> 00:21:41,120
It has

601
00:21:41,120 --> 00:21:43,120
It is designed to be

602
00:21:43,120 --> 00:21:45,120
very very scalable, and you can

603
00:21:45,120 --> 00:21:47,120
launch as many processes as you

604
00:21:47,120 --> 00:21:49,120
want, we will see

605
00:21:49,120 --> 00:21:51,120
some kinds,

606
00:21:51,120 --> 00:21:53,120
some limitations just after, but

607
00:21:53,120 --> 00:21:55,120
it's designed to be very scalable

608
00:21:55,120 --> 00:21:57,120
but the [?] arrives

609
00:21:57,120 --> 00:21:59,120
not in a very efficient manner

610
00:21:59,120 --> 00:22:01,120
I mean, it's not very clever

611
00:22:01,120 --> 00:22:03,120
it just uses and goes

612
00:22:03,120 --> 00:22:05,120
on your system.

613
00:22:05,120 --> 00:22:07,120
So, for the record,

614
00:22:07,120 --> 00:22:09,120
I have the

615
00:22:09,120 --> 00:22:11,120
storage vendor

616
00:22:11,120 --> 00:22:13,120
but was

617
00:22:13,120 --> 00:22:15,120
[?] with all

618
00:22:15,120 --> 00:22:17,120
of the

619
00:22:17,120 --> 00:22:19,120
application of the

620
00:22:19,120 --> 00:22:21,120
of the thing, and when we

621
00:22:21,120 --> 00:22:23,120
wrote to it, I mean

622
00:22:23,120 --> 00:22:25,120
99%

623
00:22:25,120 --> 00:22:27,120
of the I/O ops were

624
00:22:27,120 --> 00:22:29,120
delivered to the Munin server, so

625
00:22:29,120 --> 00:22:31,120
let's imagine what

626
00:22:31,120 --> 00:22:33,120
stays for the others.

627
00:22:33,120 --> 00:22:35,120
Not much.

628
00:22:37,120 --> 00:22:39,120
So, we put it on dedicated hardware,

629
00:22:39,120 --> 00:22:41,120
it goes slower, but

630
00:22:41,120 --> 00:22:43,120
well, other applications are happier.

631
00:22:47,120 --> 00:22:49,120
The thing I said before,

632
00:22:49,120 --> 00:22:51,120
is, use the async proxy.

633
00:22:51,120 --> 00:22:53,120
Even you don't have a special

634
00:22:53,120 --> 00:22:55,120
need on it,

635
00:22:55,120 --> 00:22:57,120
thing is, it will

636
00:22:59,120 --> 00:23:01,120
enable a very fast collection

637
00:23:01,120 --> 00:23:03,120
As all the

638
00:23:03,120 --> 00:23:05,120
I/O time or the wait

639
00:23:05,120 --> 00:23:07,120
time is absorbed directly by

640
00:23:07,120 --> 00:23:09,120
the async daemon,

641
00:23:09,120 --> 00:23:11,120
your Munin

642
00:23:11,120 --> 00:23:13,120
update almost

643
00:23:13,120 --> 00:23:15,120
doesn't wait at all.

644
00:23:15,120 --> 00:23:17,120
It only connects, reads a file

645
00:23:17,120 --> 00:23:19,120
on the server

646
00:23:19,120 --> 00:23:21,120
and disconnects. So,

647
00:23:21,120 --> 00:23:23,120
while a usually typical

648
00:23:23,120 --> 00:23:25,120
polling time is about

649
00:23:25,120 --> 00:23:27,120
ten to

650
00:23:27,120 --> 00:23:29,120
fifteen seconds

651
00:23:29,120 --> 00:23:31,120
with Munin async, typical

652
00:23:31,120 --> 00:23:33,120
time is about one second

653
00:23:33,120 --> 00:23:35,120
to mosly two seconds.

654
00:23:35,120 --> 00:23:37,120
Depends on

655
00:23:37,120 --> 00:23:39,120
You have a

656
00:23:39,120 --> 00:23:41,120
10 factor, that's

657
00:23:41,120 --> 00:23:43,120
very interesting when you want to scale.

658
00:23:47,120 --> 00:23:49,120
Because it lowers the

659
00:23:49,120 --> 00:23:51,120
numer of update workers needed,

660
00:23:51,120 --> 00:23:53,120
I said, Munin uses

661
00:23:53,120 --> 00:23:55,120
lots of RAM, OK, but

662
00:23:55,120 --> 00:23:57,120
usually you don't want to use RAM

663
00:23:57,120 --> 00:23:59,120
for Munin update, you

664
00:23:59,120 --> 00:24:01,120
prefer to use RAM for

665
00:24:01,120 --> 00:24:03,120
the restitution part,

666
00:24:03,120 --> 00:24:05,120
for graphs, for

667
00:24:05,120 --> 00:24:07,120
the XML, of which we will speak

668
00:24:07,120 --> 00:24:09,120
about later, and

669
00:24:09,120 --> 00:24:11,120
Munin update you just want

670
00:24:11,120 --> 00:24:13,120
it to be very quick and

671
00:24:15,120 --> 00:24:17,120
if you don't have...

672
00:24:17,120 --> 00:24:19,120
if it is

673
00:24:19,120 --> 00:24:21,120
not I/O-bound from the network anymore,

674
00:24:21,120 --> 00:24:23,120
it's only CPU-bound, and you

675
00:24:23,120 --> 00:24:25,120
don't want to have more than the

676
00:24:25,120 --> 00:24:27,120
CPU on your hardware

677
00:24:27,120 --> 00:24:29,120
since over this is useless anyway.

678
00:24:31,120 --> 00:24:33,120
And the thing that

679
00:24:33,120 --> 00:24:35,120
a side effect but is very nice

680
00:24:35,120 --> 00:24:37,120
it's, *if* your

681
00:24:37,120 --> 00:24:39,120
Munin updates is very slow,

682
00:24:39,120 --> 00:24:41,120
it happens,

683
00:24:41,120 --> 00:24:43,120
and we speak about the five

684
00:24:43,120 --> 00:24:45,120
minutes hard limit,

685
00:24:45,120 --> 00:24:47,120
all the async-enabled

686
00:24:47,120 --> 00:24:49,120
nodes will

687
00:24:49,120 --> 00:24:51,120
not have any data loss.

688
00:24:51,120 --> 00:24:53,120
You will have delays in integrating

689
00:24:53,120 --> 00:24:55,120
the

690
00:24:55,120 --> 00:24:57,120
the data, but you won't have

691
00:24:57,120 --> 00:24:59,120
these infamous

692
00:24:59,120 --> 00:25:01,120
white bars that most of you

693
00:25:01,120 --> 00:25:03,120
already experience,

694
00:25:03,120 --> 00:25:05,120
as these ones.

695
00:25:07,120 --> 00:25:09,120
That was for

696
00:25:09,120 --> 00:25:11,120
the server. For the node,

697
00:25:11,120 --> 00:25:13,120
as I said, you

698
00:25:13,120 --> 00:25:15,120
have, some have

699
00:25:15,120 --> 00:25:17,120
a huge number of plugins.

700
00:25:17,120 --> 00:25:19,120
The biggest install I

701
00:25:19,120 --> 00:25:21,120
saw is about

702
00:25:21,120 --> 00:25:23,120
one thousand plugins.

703
00:25:23,120 --> 00:25:25,120
Wow.

704
00:25:25,120 --> 00:25:27,120
It's

705
00:25:27,120 --> 00:25:29,120
very interesting also

706
00:25:29,120 --> 00:25:31,120
as async, because

707
00:25:31,120 --> 00:25:33,120
it has the fork option, and async

708
00:25:33,120 --> 00:25:35,120
knows it will just

709
00:25:35,120 --> 00:25:37,120
prior to

710
00:25:37,120 --> 00:25:39,120
to async

711
00:25:39,120 --> 00:25:41,120
Munin update was

712
00:25:41,120 --> 00:25:43,120
doing it very sequentially

713
00:25:43,120 --> 00:25:45,120
and 1000 plugins

714
00:25:45,120 --> 00:25:47,120
well, to have it

715
00:25:47,120 --> 00:25:49,120
in less than

716
00:25:49,120 --> 00:25:51,120
five minutes, it has to be quite fast.

717
00:25:51,120 --> 00:25:53,120
Since it's not the only load

718
00:25:53,120 --> 00:25:55,120
that is polled,

719
00:25:55,120 --> 00:25:57,120
in async, with

720
00:25:57,120 --> 00:25:59,120
the fork option,

721
00:25:59,120 --> 00:26:01,120
each plugin will be

722
00:26:01,120 --> 00:26:03,120
asked in its

723
00:26:03,120 --> 00:26:05,120
own process, so

724
00:26:05,120 --> 00:26:07,120
if you have

725
00:26:07,120 --> 00:26:09,120
long running plugins,

726
00:26:11,120 --> 00:26:13,120
as just after,

727
00:26:13,120 --> 00:26:15,120
you also can use

728
00:26:15,120 --> 00:26:17,120
the fork option,

729
00:26:17,120 --> 00:26:19,120
before the plugin can

730
00:26:19,120 --> 00:26:21,120
usually they poll themselves

731
00:26:21,120 --> 00:26:23,120
either in cron or they just read the status back

732
00:26:25,120 --> 00:26:27,120
that was the official way of doing it

733
00:26:27,120 --> 00:26:29,120
in 1.4

734
00:26:29,120 --> 00:26:31,120
but since async does exactly

735
00:26:31,120 --> 00:26:33,120
that, in 2.0

736
00:26:33,120 --> 00:26:35,120
just use async, I mean, it's

737
00:26:35,120 --> 00:26:37,120
standard, and

738
00:26:37,120 --> 00:26:39,120
it just makes using your

739
00:26:39,120 --> 00:26:41,120
whatever you

740
00:26:41,120 --> 00:26:43,120
use.

741
00:26:43,120 --> 00:26:45,120
That's for the node.

742
00:26:45,120 --> 00:26:47,120
Usually, the

743
00:26:47,120 --> 00:26:49,120
only problem that

744
00:26:49,120 --> 00:26:51,120
the node has when you have many many plugins

745
00:26:51,120 --> 00:26:53,120
is the starting of the node

746
00:26:53,120 --> 00:26:55,120
is typically serialized.

747
00:26:55,120 --> 00:26:57,120
That's

748
00:26:57,120 --> 00:26:59,120
When you have 1000 plugins, it's

749
00:26:59,120 --> 00:27:01,120
a big problem.

750
00:27:01,120 --> 00:27:03,120
Ok

751
00:27:03,120 --> 00:27:05,120
Now, we are scaling the data.

752
00:27:05,120 --> 00:27:07,120
As I said before,

753
00:27:09,120 --> 00:27:11,120
zooming

754
00:27:11,120 --> 00:27:13,120
brought the need of having prcise

755
00:27:13,120 --> 00:27:15,120
data very

756
00:27:15,120 --> 00:27:17,120
far away in time

757
00:27:17,120 --> 00:27:19,120
and

758
00:27:19,120 --> 00:27:21,120
to keep more data in RRD

759
00:27:21,120 --> 00:27:23,120
it's very very easy

760
00:27:23,120 --> 00:27:25,120
in 2.0

761
00:27:25,120 --> 00:27:27,120
you have a new option

762
00:27:27,120 --> 00:27:29,120
it's graph_data_size

763
00:27:29,120 --> 00:27:31,120
you already

764
00:27:31,120 --> 00:27:33,120
had it in

765
00:27:33,120 --> 00:27:35,120
1.4

766
00:27:35,120 --> 00:27:37,120
but it was global

767
00:27:37,120 --> 00:27:39,120
now it is per plugin

768
00:27:39,120 --> 00:27:41,120
it's also global, but you can

769
00:27:41,120 --> 00:27:43,120
precise it per plugin

770
00:27:43,120 --> 00:27:45,120
and

771
00:27:47,120 --> 00:27:49,120
actually, it's designed to be

772
00:27:49,120 --> 00:27:51,120
per field, but it

773
00:27:51,120 --> 00:27:53,120
doesn't work, it...

774
00:27:53,120 --> 00:27:55,120
it's bugged, and

775
00:27:55,120 --> 00:27:57,120
mostly works per plugin, that's where

776
00:27:57,120 --> 00:27:59,120
it works well, but it

777
00:27:59,120 --> 00:28:01,120
only works on RRD create, so

778
00:28:01,120 --> 00:28:03,120
there is an external tool to

779
00:28:03,120 --> 00:28:05,120
move it

780
00:28:09,120 --> 00:28:11,120
rota tool, that is called

781
00:28:11,120 --> 00:28:13,120
RRD copy, to move from

782
00:28:13,120 --> 00:28:15,120
some data, from

783
00:28:15,120 --> 00:28:17,120
a small RRD, to a bigger

784
00:28:17,120 --> 00:28:19,120
RRD.

785
00:28:19,120 --> 00:28:21,120
But that's not part of

786
00:28:21,120 --> 00:28:23,120
core Munin.

787
00:28:23,120 --> 00:28:25,120
And when you create it

788
00:28:29,120 --> 00:28:31,120
Its graphing is handled automatically by RRD

789
00:28:39,120 --> 00:28:41,120
And RRD, as I said, is very very very

790
00:28:41,120 --> 00:28:43,120
efficient, but

791
00:28:43,120 --> 00:28:45,120
beware. It can use very

792
00:28:45,120 --> 00:28:47,120
much space. I mean, I had

793
00:28:47,120 --> 00:28:49,120
one person

794
00:28:49,120 --> 00:28:51,120
who wanted to have

795
00:28:51,120 --> 00:28:53,120
a ten-seconds precision for two years

796
00:28:53,120 --> 00:28:55,120
...Wow.

797
00:28:55,120 --> 00:28:57,120
It's about 500 megabytes

798
00:28:57,120 --> 00:28:59,120
per RRD

799
00:28:59,120 --> 00:29:01,120
so, per line

800
00:29:01,120 --> 00:29:03,120
in Munin

801
00:29:03,120 --> 00:29:05,120
Big data.

802
00:29:07,120 --> 00:29:09,120
You can also increase RRD precision

803
00:29:11,120 --> 00:29:13,120
it's called supersampling,

804
00:29:13,120 --> 00:29:15,120
that works

805
00:29:15,120 --> 00:29:17,120
without munin-async

806
00:29:17,120 --> 00:29:19,120
if you put

807
00:29:19,120 --> 00:29:21,120
munin-async, it will do

808
00:29:21,120 --> 00:29:23,120
the job for you.

809
00:29:23,120 --> 00:29:25,120
I will just go a little faster

810
00:29:25,120 --> 00:29:27,120
since my time is almost up.

811
00:29:35,120 --> 00:29:37,120
Bigger thing is

812
00:29:37,120 --> 00:29:39,120
If you modify the

813
00:29:39,120 --> 00:29:41,120
RRD size

814
00:29:41,120 --> 00:29:43,120
always have

815
00:29:43,120 --> 00:29:45,120
the RRA increased

816
00:29:45,120 --> 00:29:47,120
because when you want

817
00:29:47,120 --> 00:29:49,120
to have the graphs

818
00:29:49,120 --> 00:29:51,120
if you

819
00:29:51,120 --> 00:29:53,120
take huge, for example

820
00:29:53,120 --> 00:29:55,120
there is a setting that is "huge",

821
00:29:55,120 --> 00:29:57,120
it's not very...

822
00:29:57,120 --> 00:29:59,120
These settings aren't very nice, because

823
00:29:59,120 --> 00:30:01,120
it only has the

824
00:30:01,120 --> 00:30:03,120
maximum precision for

825
00:30:03,120 --> 00:30:05,120
two years, but

826
00:30:05,120 --> 00:30:07,120
it doesn't have any RRA, and the RRA

827
00:30:07,120 --> 00:30:09,120
are a part of Munin's

828
00:30:09,120 --> 00:30:11,120
ability to reply

829
00:30:11,120 --> 00:30:13,120
very fast on a yearly graph

830
00:30:13,120 --> 00:30:15,120
for example. So if you

831
00:30:15,120 --> 00:30:17,120
it's pre-consolidation

832
00:30:17,120 --> 00:30:19,120
for yearly values.

833
00:30:19,120 --> 00:30:21,120
So, the ideal way

834
00:30:21,120 --> 00:30:23,120
you know the size of the graph,

835
00:30:23,120 --> 00:30:25,120
in your templates,

836
00:30:25,120 --> 00:30:27,120
and if you have one RRA

837
00:30:27,120 --> 00:30:29,120
per pixel in the

838
00:30:29,120 --> 00:30:31,120
graph outputted, it goes

839
00:30:31,120 --> 00:30:33,120
fastest, since it doesn't

840
00:30:33,120 --> 00:30:35,120
even have to interpolate the data.

841
00:30:39,120 --> 00:30:41,120
So... Now,

842
00:30:41,120 --> 00:30:43,120
the limitations of 2.0

843
00:30:45,120 --> 00:30:47,120
The CGI of HTML

844
00:30:47,120 --> 00:30:49,120
is very, very

845
00:30:49,120 --> 00:30:51,120
very ugly.

846
00:30:51,120 --> 00:30:53,120
I don't know yf many many of you

847
00:30:53,120 --> 00:30:55,120
tried with big installs

848
00:30:55,120 --> 00:30:57,120
but the practical limit is about

849
00:30:59,120 --> 00:31:01,120
between 150

850
00:31:01,120 --> 00:31:03,120
and 200 nodes.

851
00:31:03,120 --> 00:31:05,120
After that, it's

852
00:31:05,120 --> 00:31:07,120
very, very slow.

853
00:31:07,120 --> 00:31:09,120
And it's slow

854
00:31:09,120 --> 00:31:11,120
on reload, because

855
00:31:11,120 --> 00:31:13,120
the whole configuration

856
00:31:13,120 --> 00:31:15,120
is stored in a big

857
00:31:15,120 --> 00:31:17,120
storable file

858
00:31:17,120 --> 00:31:19,120
that is

859
00:31:21,120 --> 00:31:23,120
that is reloaded

860
00:31:23,120 --> 00:31:25,120
and most of the time is

861
00:31:25,120 --> 00:31:27,120
took by

862
00:31:27,120 --> 00:31:29,120
storable.reload

863
00:31:29,120 --> 00:31:31,120
so I cannot do much about it.

864
00:31:31,120 --> 00:31:33,120
We'll see how

865
00:31:33,120 --> 00:31:35,120
I plan to do it.

866
00:31:37,120 --> 00:31:39,120
The UI itself does not

867
00:31:39,120 --> 00:31:41,120
... It is not very scalable.

868
00:31:41,120 --> 00:31:43,120
I mean,

869
00:31:43,120 --> 00:31:45,120
you all know the default

870
00:31:45,120 --> 00:31:47,120
UI, so now

871
00:31:47,120 --> 00:31:49,120
you have your cluster, just imagine

872
00:31:49,120 --> 00:31:51,120
one thousand nodes inside

873
00:31:51,120 --> 00:31:53,120
it's, well,

874
00:31:53,120 --> 00:31:55,120
it's a little bit flat, and

875
00:31:55,120 --> 00:31:57,120
not very...

876
00:31:57,120 --> 00:31:59,120
All the nodes

877
00:31:59,120 --> 00:32:01,120
are essentially

878
00:32:01,120 --> 00:32:03,120
on the overview and

879
00:32:03,120 --> 00:32:05,120
it's very static, and it's not

880
00:32:05,120 --> 00:32:07,120
what one

881
00:32:07,120 --> 00:32:09,120
does expect in 2013.

882
00:32:11,120 --> 00:32:13,120
Because we all have

883
00:32:13,120 --> 00:32:15,120
this web app,

884
00:32:15,120 --> 00:32:17,120
and this

885
00:32:17,120 --> 00:32:19,120
phone shining with

886
00:32:19,120 --> 00:32:21,420
very dynamic stuff

887
00:32:21,420 --> 00:32:23,420
And ours is not very dynamic, I agree.

888
00:32:25,420 --> 00:32:27,420
The same is if you

889
00:32:27,420 --> 00:32:29,420
you know the comparison page?

890
00:32:29,420 --> 00:32:31,420
I mean, if every node

891
00:32:31,420 --> 00:32:33,420
of a group and every graph, just imagine

892
00:32:33,420 --> 00:32:35,420
that on one thousand nodes

893
00:32:35,420 --> 00:32:37,420
one thousand plugins

894
00:32:37,420 --> 00:32:39,420
Your Firefox

895
00:32:39,420 --> 00:32:41,420
won't have any memory anymore.

896
00:32:43,420 --> 00:32:45,420
And the last thing is

897
00:32:45,420 --> 00:32:47,420
it lacks proper ACL.

898
00:32:47,420 --> 00:32:49,420
For a bigger install,

899
00:32:49,420 --> 00:32:51,420
usually you want to delegate

900
00:32:51,420 --> 00:32:53,420
monitoring to

901
00:32:53,420 --> 00:32:55,420
subsystems, and you don't

902
00:32:55,420 --> 00:32:57,420
want people to see everything, because

903
00:32:57,420 --> 00:32:59,420
it will be overwhelmed by

904
00:32:59,420 --> 00:33:01,420
the information, and

905
00:33:01,420 --> 00:33:03,420
well, that's

906
00:33:05,420 --> 00:33:07,420
that's a problem.

907
00:33:07,420 --> 00:33:09,420
So, I'll just go

908
00:33:09,420 --> 00:33:11,420
very fast, that's my last slide

909
00:33:13,420 --> 00:33:15,420
So, for 2.2 we will be

910
00:33:15,420 --> 00:33:17,420
integrating into

911
00:33:17,420 --> 00:33:19,420
2.1 and

912
00:33:19,420 --> 00:33:21,420
when it is stable it will become 2.2

913
00:33:21,420 --> 00:33:23,420
it's moving from

914
00:33:23,420 --> 00:33:25,420
the whole Storable thing to

915
00:33:25,420 --> 00:33:27,420
SQL-based, and

916
00:33:27,420 --> 00:33:29,420
the SQL-based will be

917
00:33:29,420 --> 00:33:31,420
DBI-based, because we are still in Perl

918
00:33:31,420 --> 00:33:33,420
and will by SQLite

919
00:33:33,420 --> 00:33:35,420
by default, because

920
00:33:35,420 --> 00:33:37,420
we really want the

921
00:33:37,420 --> 00:33:39,420
nice out-of-the-box

922
00:33:39,420 --> 00:33:41,420
install, remember our users

923
00:33:41,420 --> 00:33:43,420
[many] of them

924
00:33:43,420 --> 00:33:45,420
are the one-node type,

925
00:33:45,420 --> 00:33:47,420
and if you want, you can

926
00:33:47,420 --> 00:33:49,420
do PostgreSQL

927
00:33:49,420 --> 00:33:51,420
and if you want, you can do

928
00:33:51,420 --> 00:33:53,420
whatever DBI supports

929
00:33:53,420 --> 00:33:55,420
it's just up to you.

930
00:33:55,420 --> 00:33:57,420
It will enable dynamic HTML

931
00:33:57,420 --> 00:33:59,420
because

932
00:33:59,420 --> 00:34:01,420
well, we are

933
00:34:01,420 --> 00:34:03,420
not in 2001 anymore

934
00:34:05,420 --> 00:34:07,420
But that will require

935
00:34:07,420 --> 00:34:09,420
a deep rewrite of the code

936
00:34:09,420 --> 00:34:11,420
As I said before,

937
00:34:11,420 --> 00:34:13,420
when you have many many accessors

938
00:34:13,420 --> 00:34:15,420
to Storable inside the core

939
00:34:15,420 --> 00:34:17,420
but

940
00:34:17,420 --> 00:34:19,420
since it was a big Storable,

941
00:34:19,420 --> 00:34:21,420
it was a native Perl data structure

942
00:34:21,420 --> 00:34:23,420
so for

943
00:34:23,420 --> 00:34:25,420
whatever reason

944
00:34:25,420 --> 00:34:27,420
many code does not use

945
00:34:27,420 --> 00:34:29,420
accessors, they use it

946
00:34:29,420 --> 00:34:31,420
in a typical Perl way

947
00:34:31,420 --> 00:34:33,420
and that makes it very difficult

948
00:34:33,420 --> 00:34:35,420
to translate to SQL.

949
00:34:35,420 --> 00:34:37,420
So that's a challenge.

950
00:34:39,420 --> 00:34:41,420
And just to be completely crystal-clear,

951
00:34:43,420 --> 00:34:45,420
the data will stay,

952
00:34:45,420 --> 00:34:47,420
the data that is in RRD will stay

953
00:34:47,420 --> 00:34:49,420
in RRD. I don't want

954
00:34:49,420 --> 00:34:51,420
to put the timestamped

955
00:34:51,420 --> 00:34:53,420
value inside SQL.

956
00:34:53,420 --> 00:34:55,420
That's not the point.

957
00:35:07,420 --> 00:35:09,420
We will have a complete

958
00:35:09,420 --> 00:35:11,420
node push

959
00:35:11,420 --> 00:35:13,420
feature

960
00:35:13,420 --> 00:35:15,420
The node can push

961
00:35:15,420 --> 00:35:17,420
on the master

962
00:35:17,420 --> 00:35:19,420
in order

963
00:35:19,420 --> 00:35:21,420
well, to have

964
00:35:21,420 --> 00:35:23,420
to break this five minutes pause

965
00:35:23,420 --> 00:35:25,420
standard, so you can put

966
00:35:25,420 --> 00:35:27,420
whenever you want, at every second if you want

967
00:35:27,420 --> 00:35:29,420
and

968
00:35:31,420 --> 00:35:33,420
this will enable

969
00:35:33,420 --> 00:35:35,420
very fine

970
00:35:37,420 --> 00:35:39,420
precision

971
00:35:39,420 --> 00:35:41,420
and my goal is to be as good as

972
00:35:41,420 --> 00:35:43,420
collectd.

973
00:35:43,420 --> 00:35:45,420
If...

974
00:35:45,420 --> 00:35:47,420
And well, if you have

975
00:35:47,420 --> 00:35:49,420
that little

976
00:35:49,420 --> 00:35:51,420
blurb on the new

977
00:35:51,420 --> 00:35:53,420
HTML5 UI

978
00:36:01,420 --> 00:36:03,420
I speeded up the end, so you have

979
00:36:03,420 --> 00:36:05,420
a little time for questions.

980
00:36:05,420 --> 00:36:07,420
If you want.

981
00:36:17,420 --> 00:36:19,420
[question1] With this...

982
00:36:23,420 --> 00:36:25,420
Is it possible, with this new

983
00:36:25,420 --> 00:36:27,420
architecture

984
00:36:27,420 --> 00:36:29,420
to

985
00:36:33,420 --> 00:36:35,420
(sorry, I just missed my question)

986
00:36:37,420 --> 00:36:39,420
[presenter] The SQL one, you mean? Or

987
00:36:39,420 --> 00:36:41,420
[question] No, I had just wrote it down...

988
00:36:41,420 --> 00:36:43,420
So, I'm sorry for...

989
00:36:43,420 --> 00:36:45,420
[laughter]

990
00:36:47,420 --> 00:36:49,420
[presenter] The async? Or...

991
00:36:49,420 --> 00:36:51,420
[question1] Yes. Do you fork

992
00:36:51,420 --> 00:36:53,420
the plugins?

993
00:36:53,420 --> 00:36:55,420
The architecture still

994
00:36:55,420 --> 00:36:57,420
forks the plugins

995
00:36:57,420 --> 00:36:59,420
every time, or

996
00:36:59,420 --> 00:37:01,420
is it possible to run

997
00:37:01,420 --> 00:37:03,420
the plugin and keep it all running

998
00:37:03,420 --> 00:37:05,420
and just feedback some values

999
00:37:05,420 --> 00:37:07,420
as you will

1000
00:37:07,420 --> 00:37:09,420
you mentioned collectd, which

1001
00:37:09,420 --> 00:37:11,420
builds on this architecture.

1002
00:37:15,420 --> 00:37:17,420
[presenter]: I designed the

1003
00:37:17,420 --> 00:37:19,420
...a new extension

1004
00:37:19,420 --> 00:37:21,420
a new verb for plugins

1005
00:37:21,420 --> 00:37:23,420
it's called "stream"

1006
00:37:23,420 --> 00:37:25,420
and this is, you just

1007
00:37:25,420 --> 00:37:27,420
launch the plugin, you ask

1008
00:37:27,420 --> 00:37:29,420
for a config

1009
00:37:29,420 --> 00:37:31,420
and then you ask for the stream, and

1010
00:37:31,420 --> 00:37:33,420
when the plugin quits

1011
00:37:33,420 --> 00:37:35,420
it means, just sends

1012
00:37:35,420 --> 00:37:37,420
periodically values

1013
00:37:37,420 --> 00:37:39,420
at the rate he wants to

1014
00:37:39,420 --> 00:37:41,420
so it's very

1015
00:37:41,420 --> 00:37:43,420
...it is designed to capture, for example

1016
00:37:43,420 --> 00:37:45,420
the output of

1017
00:37:45,420 --> 00:37:47,420
vmstat

1018
00:37:47,420 --> 00:37:49,420
you can do

1019
00:37:49,420 --> 00:37:51,420
cat vmstat | awk

1020
00:37:51,420 --> 00:37:53,420
and, well, that's your

1021
00:37:53,420 --> 00:37:55,420
plugin output. It will stay

1022
00:37:55,420 --> 00:37:57,420
in memory. And

1023
00:37:57,420 --> 00:37:59,420
the plugin will kill himself

1024
00:37:59,420 --> 00:38:01,420
when the configuration changes.

1025
00:38:01,420 --> 00:38:03,420
That's

1026
00:38:03,420 --> 00:38:05,420
the design. But the promise

1027
00:38:05,420 --> 00:38:07,420
I didn't put it in 2.2

1028
00:38:07,420 --> 00:38:09,420
because I won't have time to do it.

1029
00:38:11,420 --> 00:38:13,420
But that's the way

1030
00:38:13,420 --> 00:38:15,420
it is done. But basically

1031
00:38:15,420 --> 00:38:17,420
the

1032
00:38:17,420 --> 00:38:19,420
architecture of forking/exec

1033
00:38:19,420 --> 00:38:21,420
a plugin

1034
00:38:21,420 --> 00:38:23,420
is, or will be

1035
00:38:23,420 --> 00:38:25,420
at the core of Munin. It won't be

1036
00:38:25,420 --> 00:38:27,420
for example

1037
00:38:27,420 --> 00:38:29,420
a DO .eso [?] or

1038
00:38:31,420 --> 00:38:33,420
that you will chart in, or .pm

1039
00:38:33,420 --> 00:38:35,420
that you will charge in

1040
00:38:35,420 --> 00:38:37,420
Munin memoryspace

1041
00:38:37,420 --> 00:38:39,420
That's

1042
00:38:39,420 --> 00:38:41,420
That's not something I want to.

1043
00:38:47,420 --> 00:38:49,420
[questioner1]: This was the

1044
00:38:49,420 --> 00:38:51,420
thing which

1045
00:38:51,420 --> 00:38:53,420
I really liked

1046
00:38:53,420 --> 00:38:55,420
Munin, and used it

1047
00:38:55,420 --> 00:38:57,420
in 1.2 or whatever, but

1048
00:38:57,420 --> 00:38:59,420
it had scaling problems

1049
00:38:59,420 --> 00:39:01,420
with regards of

1050
00:39:01,420 --> 00:39:03,420
work. So that was one of the reasons

1051
00:39:03,420 --> 00:39:05,420
I have to

1052
00:39:05,420 --> 00:39:07,420
change to another system.

1053
00:39:17,420 --> 00:39:19,420
[questioner2]: Hi.

1054
00:39:19,420 --> 00:39:21,420
So, I was a happy Munin user

1055
00:39:21,420 --> 00:39:23,420
And then, suddenly I

1056
00:39:23,420 --> 00:39:25,420
well,

1057
00:39:25,420 --> 00:39:27,420
because of the scaling issues,

1058
00:39:27,420 --> 00:39:29,420
I moved to a

1059
00:39:29,420 --> 00:39:31,420
pnp4nagios

1060
00:39:31,420 --> 00:39:33,420
and that's one

1061
00:39:33,420 --> 00:39:35,420
question I want to ask, is about how...

1062
00:39:35,420 --> 00:39:37,420
Because, with all this data

1063
00:39:37,420 --> 00:39:39,420
that is a great thing in Munin

1064
00:39:39,420 --> 00:39:41,420
then you can do proxy

1065
00:39:43,420 --> 00:39:43,670
monitoring

1066
00:39:43,670 --> 00:39:45,420
that is, sending OLS if [?]
monitoring

1067
00:39:45,420 --> 00:39:53,520
that is, sending OLS if [?]

1068
00:39:53,520 --> 00:39:55,520
Do you plan on

1069
00:39:55,520 --> 00:39:57,520
having better integration with

1070
00:39:57,520 --> 00:39:59,520
[?] systems

1071
00:39:59,520 --> 00:40:01,520
than you have currently?

1072
00:40:01,520 --> 00:40:03,520
[presenter]: Actually

1073
00:40:03,520 --> 00:40:05,520
the point of

1074
00:40:05,520 --> 00:40:07,520
Nagios, I mean, we

1075
00:40:09,520 --> 00:40:11,520
We have very much problem with

1076
00:40:11,520 --> 00:40:13,520
because of

1077
00:40:13,520 --> 00:40:15,520
nscachanger

1078
00:40:15,520 --> 00:40:17,520
its interface, lately,

1079
00:40:19,520 --> 00:40:21,520
The thing is, we have

1080
00:40:21,520 --> 00:40:23,520
something called

1081
00:40:23,520 --> 00:40:25,520
munin-limits, and

1082
00:40:25,520 --> 00:40:27,520
it sends

1083
00:40:27,520 --> 00:40:29,520
[?] and so on, but it does not

1084
00:40:29,520 --> 00:40:31,520
do it very well.

1085
00:40:31,520 --> 00:40:33,520
So,

1086
00:40:33,520 --> 00:40:35,520
the integration with other systems

1087
00:40:35,520 --> 00:40:37,520
such as Nagios

1088
00:40:37,520 --> 00:40:39,520
icinger or whatever

1089
00:40:39,520 --> 00:40:41,520
is very very high on my top list

1090
00:40:41,520 --> 00:40:43,520
because I don't want to reimplement Nagios.

1091
00:40:43,520 --> 00:40:45,520
I mean, it is

1092
00:40:45,520 --> 00:40:47,520
I want to focus

1093
00:40:47,520 --> 00:40:49,520
on data gathering and

1094
00:40:49,520 --> 00:40:51,520
data keeping, I mean

1095
00:40:51,520 --> 00:40:53,520
I'm more interested in

1096
00:40:53,520 --> 00:40:55,520
replacing something like php4nagios

1097
00:40:55,520 --> 00:40:57,520
than Nagios itself.

1098
00:40:57,520 --> 00:40:59,520
[question2]: Becuase the

1099
00:40:59,520 --> 00:41:01,520
munin-limits, for example

1100
00:41:01,520 --> 00:41:03,520
it only has

1101
00:41:03,520 --> 00:41:05,520
threshold, like, if

1102
00:41:05,520 --> 00:41:07,520
it is a set of values, then 1

1103
00:41:07,520 --> 00:41:09,520
whereas I'm also interested in

1104
00:41:09,520 --> 00:41:11,520
questions like, OK,

1105
00:41:11,520 --> 00:41:13,520
usually this filesystem

1106
00:41:13,520 --> 00:41:15,520
is growing at 1% rate every

1107
00:41:15,520 --> 00:41:17,520
day, and suddenly it grew out like

1108
00:41:17,520 --> 00:41:19,520
50%. I want a warning there.

1109
00:41:19,520 --> 00:41:21,520
That's, you know...

1110
00:41:21,520 --> 00:41:23,520
[presenter]: Exactly, that's something

1111
00:41:23,520 --> 00:41:25,520
that is even offered by

1112
00:41:25,520 --> 00:41:27,520
RRD right now

1113
00:41:27,520 --> 00:41:29,520
and I also

1114
00:41:29,520 --> 00:41:31,520
have it on my future

1115
00:41:31,520 --> 00:41:33,520
roadmap, but, well,

1116
00:41:35,520 --> 00:41:37,520
I'm taking the problem

1117
00:41:37,520 --> 00:41:39,520
for user-facing

1118
00:41:39,520 --> 00:41:41,520
right now, but

1119
00:41:41,520 --> 00:41:43,520
everyone is welcome to help.

1120
00:41:43,520 --> 00:41:45,520
[question2]: I'm looking forward to it.

1121
00:41:49,520 --> 00:41:51,520
[presenter]: Yes, OK, so

1122
00:41:51,520 --> 00:41:53,520
time is up, so I guess you have to ask your questions

1123
00:41:55,520 --> 00:41:57,520
after the talk, and

1124
00:41:57,520 --> 00:41:59,520
thank you

1125
00:41:59,520 --> 00:42:01,520
[audience]: Just, there is a

1126
00:42:01,520 --> 00:42:03,520
BoF session

1127
00:42:03,520 --> 00:42:05,520
this afternoon, if you have some questions,

1128
00:42:05,520 --> 00:42:07,520
or anything specific, just come, and

1129
00:42:07,520 --> 00:42:09,520
I'll be glad to answer.

1130
00:42:09,520 --> 00:42:11,520
[presenter]: Thanks

1131
00:42:11,520 --> 00:42:13,520
[applause]
