1
00:00:07,466 --> 00:00:10,156
I'm here today to talk to you about
diffoscope

2
00:00:10,156 --> 00:00:13,190
and how you can use it as a better diff

3
00:00:14,063 --> 00:00:16,166
or for Quality Assurance, etc., things
like that.

4
00:00:19,789 --> 00:00:20,810
Moin!

5
00:00:20,815 --> 00:00:24,409
Apparently that's like a North German
thing to say "welcome".

6
00:00:25,938 --> 00:00:29,898
North german, north Denmark, Scandinavia,
that kind of thing, I'm told.

7
00:00:31,836 --> 00:00:34,197
People are shaking their head, so I'm
going to assume that's true.

8
00:00:37,306 --> 00:00:40,425
This is my first PC, an IBM 5155.

9
00:00:41,623 --> 00:00:46,441
Sometimes, when you rebooted it, it would
launch into, it would somehow revert

10
00:00:46,688 --> 00:00:50,971
from booting from the hard disk to booting
from a basic ROM,

11
00:00:51,359 --> 00:00:52,959
as in the programming language ROM.

12
00:00:53,017 --> 00:00:54,320
It was on my motherboard for some reason.

13
00:00:54,912 --> 00:00:57,691
So, randomly, you just get a chance to
program in basic and then,

14
00:00:57,957 --> 00:01:00,456
sometimes you wouldn't, I don't know why,
but… yeah.

15
00:01:00,718 --> 00:01:05,173
It's quite fun with this kind of clicky
keyboard, and that folded in

16
00:01:05,519 --> 00:01:07,058
and it was this kind of big desk thing.

17
00:01:07,058 --> 00:01:08,014
Anyway…

18
00:01:09,067 --> 00:01:10,187
This is my first Debian.

19
00:01:10,500 --> 00:01:11,837
At the time it was already old.

20
00:01:12,890 --> 00:01:15,908
What's this one? Is this Slink? 2.2?
Yeah.

21
00:01:17,077 --> 00:01:22,043
And this is when we had US and non-US,
so that's really dating if you remember that.

22
00:01:23,522 --> 00:01:28,393
This is my first contribution to Debian,
19th December 2006,

23
00:01:28,803 --> 00:01:33,738
sending a patch to lillypond which is kind
of interesting

24
00:01:34,155 --> 00:01:37,205
and the response was "Oh yeah, rock on,
many thanks. I'll upload this and

25
00:01:37,440 --> 00:01:38,723
it'll be landing to Etch".

26
00:01:39,007 --> 00:01:43,408
And this was super motivating because
Etch was just coming out and it was like

27
00:01:43,602 --> 00:01:48,732
"Great, I've got let one line of tiny patch
in a release. This is super cool."

28
00:01:49,118 --> 00:01:52,687
Thomas' response was super motivating.

29
00:01:52,993 --> 00:01:56,450
So, after that, like that Christmas
basically spent ???

30
00:01:56,675 --> 00:01:59,754
Debian webpages and stuff.

31
00:02:00,327 --> 00:02:01,568
Very well timed.

32
00:02:02,234 --> 00:02:03,566
That's kind of a good…

33
00:02:04,301 --> 00:02:07,379
You know, someone sends a patch, be like
"Cool, thanks"

34
00:02:07,849 --> 00:02:09,434
Like a little notice in the changelog.

35
00:02:09,807 --> 00:02:14,344
It was, you know, so stupid but…
Yeah, do that kind of thing.

36
00:02:15,558 --> 00:02:17,249
So, moving on.

37
00:02:17,641 --> 00:02:20,276
Why diffoscope?
Why did we write diffoscope?

38
00:02:20,552 --> 00:02:21,880
What's the background here?

39
00:02:22,184 --> 00:02:24,575
It comes from Reproducible Builds.

40
00:02:24,911 --> 00:02:28,983
The very quick outline is that once you
get the source code for free software,

41
00:02:29,208 --> 00:02:31,505
you download the source code for nginx
or whatever,

42
00:02:31,998 --> 00:02:35,844
pretty much everyone just runs binaries
on their servers or their systems.

43
00:02:36,110 --> 00:02:39,119
You know, "apt install bla", "yum install",
whatever.

44
00:02:40,531 --> 00:02:41,535
Android Play Store, whatever.

45
00:02:42,479 --> 00:02:46,176
Can you actually trust whether these two
things correspond with each other?

46
00:02:46,470 --> 00:02:49,926
You've gotten the source code, it looks
alright, and then you install this binary,

47
00:02:50,847 --> 00:02:51,821
yeah…

48
00:02:52,459 --> 00:02:55,861
Who generated that? Can you trust that
process?

49
00:02:56,275 --> 00:02:57,430
Can you trust who generated it?

50
00:02:58,351 --> 00:03:01,493
Even if you could trust them, could you
trust them not to be exploited? Etc.

51
00:03:02,295 --> 00:03:04,765
This is a big problem because you can
exploit a build farm and then

52
00:03:05,160 --> 00:03:09,895
obviously exploit all of that, you know,
a trojan into the build farm,

53
00:03:10,097 --> 00:03:13,290
so every single binary that comes out
is compromised.

54
00:03:13,708 --> 00:03:14,792
Kind of problematic.

55
00:03:15,060 --> 00:03:17,686
You could also target individual developers
machines,

56
00:03:17,937 --> 00:03:21,288
so I could go of to, say, your machine,
add a backdoor to it,

57
00:03:21,578 --> 00:03:25,241
so every binary that you give to friends
and things like that,

58
00:03:26,935 --> 00:03:30,485
are compromised in some way, stealing
your Bitcoins or whatever.

59
00:03:31,802 --> 00:03:36,127
I can also turn up at your door
and blackmail you into producing

60
00:03:38,522 --> 00:03:42,997
software that has compromises or extra
features, shall we say,

61
00:03:43,472 --> 00:03:44,783
that don't exist in the source code.

62
00:03:45,133 --> 00:03:47,885
So what will happen there is that you'd
release your source

63
00:03:48,093 --> 00:03:51,968
and the binaries you produce have
this sort of backdoor that, you know,

64
00:03:52,435 --> 00:03:55,127
someone is forcing you into producing.

65
00:03:55,464 --> 00:03:56,679
So, you don't want to do that.

66
00:03:56,856 --> 00:03:57,505
Anyway

67
00:03:58,197 --> 00:03:59,228
enough of that.

68
00:03:59,228 --> 00:04:03,211
What you do for Reproducible Builds is you
ensure that every time you build

69
00:04:03,467 --> 00:04:05,773
a piece of software, you get an identical
result.

70
00:04:06,916 --> 00:04:10,885
Multiple people then compare their builds
and check whether they all get

71
00:04:07,074 --> 00:04:11,068
the same results

72
00:04:11,068 --> 00:04:15,626
and this means that an attacker must
either have infected everyone

73
00:04:15,626 --> 00:04:17,726
at the same time, or they haven't
infected anyone.

74
00:04:20,673 --> 00:04:24,058
The point here is that you have to ensure
that builds have identical results.

75
00:04:24,173 --> 00:04:25,163
Ok, great.

76
00:04:28,003 --> 00:04:32,539
So, we started the Reproducible Builds
project, etc.

77
00:04:33,470 --> 00:04:34,744
And we build 2 .debs.

78
00:04:35,112 --> 00:04:36,537
Oh, I'm sorry about the colors there.

79
00:04:38,067 --> 00:04:38,965
You probably can't see that.

80
00:04:39,349 --> 00:04:42,485
That says "sha1sum a.deb b.deb".

81
00:04:46,128 --> 00:04:50,775
Anyway, we're comparing the sha1sums
of 2 binary Debian files.

82
00:04:51,424 --> 00:04:53,922
So, these two files differ.

83
00:04:54,222 --> 00:04:55,612
Ok, they're not reproducible.

84
00:04:56,807 --> 00:04:57,527
Why is that?

85
00:04:57,873 --> 00:04:59,656
So we run a diff on them.

86
00:05:00,140 --> 00:05:00,637
Yeah…

87
00:05:01,340 --> 00:05:04,093
So, what can we learn from this?

88
00:05:04,418 --> 00:05:08,508
Well, not very much, visibly they're
compressed so

89
00:05:08,947 --> 00:05:13,012
as soon as we see one change, we'll see
they would just cascade changes

90
00:05:13,362 --> 00:05:14,866
because that's how compression works.

91
00:05:16,241 --> 00:05:23,983
I guess we know it's a .deb probably a ar
format file, not very useful.

92
00:05:24,193 --> 00:05:26,005
Ok, great so we're gonna have a look in

93
00:05:26,492 --> 00:05:29,919
We'll do a binary diff and ok, well…

94
00:05:30,923 --> 00:05:32,790
Again, that's not really telling us
very much

95
00:05:34,413 --> 00:05:36,515
with the diff there.

96
00:05:37,206 --> 00:05:38,426
Ok, great.

97
00:05:39,417 --> 00:05:40,427
??? one level in

98
00:05:40,513 --> 00:05:44,834
"ar x" is on the New Maintainer thing,
"how you unpack a .deb"

99
00:05:44,858 --> 00:05:46,215
Everyone remembers this, right?

100
00:05:48,196 --> 00:05:51,167
You unpack a.deb with "ar x" and you
do that to b.deb

101
00:05:51,599 --> 00:05:53,606
and then we diff the results of that.

102
00:05:54,099 --> 00:05:57,824
Ok, so…yeah, 7zip.

103
00:05:58,948 --> 00:06:01,329
Ok, compressed content, not very useful.

104
00:06:01,897 --> 00:06:07,898
Ok, so let's unpack the control.tar inside
these .debs.

105
00:06:08,725 --> 00:06:10,145
And then we run diff on that.

106
00:06:12,693 --> 00:06:16,850
Still not really telling anything useful
about how to make this package reproducible

107
00:06:17,487 --> 00:06:20,345
So let's unpack the .tar.xz into the. tar.

108
00:06:22,463 --> 00:06:28,348
Inside that tar, there's a file called
md5sums and we start to see some differences

109
00:06:28,768 --> 00:06:33,370
between some files in these two .debs.

110
00:06:33,640 --> 00:06:36,527
??? meaningful, so now
we have some idea that

111
00:06:36,855 --> 00:06:39,101
it has something to do with this
/usr/bin/pmixer binary.

112
00:06:39,682 --> 00:06:40,653
Ok, interesting.

113
00:06:41,989 --> 00:06:45,015
We'll unzip that and then we do a diff on
pmixer itself.

114
00:06:45,914 --> 00:06:48,600
Now we're back into just binary
"globbledegook" mode

115
00:06:49,002 --> 00:06:51,736
This isn't very helpful and this is taking
quite a while

116
00:06:52,399 --> 00:06:54,663
and if I remember correctly, Debian has
a lot of packages.

117
00:06:55,182 --> 00:06:56,784
So this might take a little while.

118
00:06:57,601 --> 00:07:00,415
So, basically, ??? mean

119
00:07:00,782 --> 00:07:02,008
I should build a better diff.

120
00:07:03,703 --> 00:07:05,194
That's not quite true, this is actually…

121
00:07:05,783 --> 00:07:07,472
It was Lunar that started this project

122
00:07:07,801 --> 00:07:10,670
and it was called debbindiff, because
we wanted to diff

123
00:07:11,093 --> 00:07:12,264
binary Debian packages.

124
00:07:13,474 --> 00:07:15,040
So this is the initial commit, 2014.

125
00:07:16,962 --> 00:07:20,100
"The version is successfully able to report
differences in two .changes files.

126
00:07:20,100 --> 00:07:22,343
Not with much interesting details,
but it's a start."

127
00:07:22,762 --> 00:07:23,806
And it was a start.

128
00:07:27,581 --> 00:07:29,918
Fast forwarding… Oh, sorry about these
colors,

129
00:07:30,307 --> 00:07:31,872
I don't know if we can do anything about
the lights?

130
00:07:34,713 --> 00:07:35,363
Yeah?

131
00:07:37,830 --> 00:07:38,080
No?

132
00:07:42,124 --> 00:07:42,974
Allright, whatever…

133
00:07:43,700 --> 00:07:46,410
Basically, we're diffoscoping on…

134
00:07:47,546 --> 00:07:49,595
It works kind of diff does normally,

135
00:07:49,981 --> 00:07:51,995
you give it two files, it outputs
a unified diff.

136
00:07:52,699 --> 00:07:59,427
So "diffoscope a b", one file contains
the word "foo", one contains the word "bar".

137
00:08:01,241 --> 00:08:03,340
Nothing actually out of the ordinary.

138
00:08:03,974 --> 00:08:07,670
It's sort of colored by default, so that's
why you can't see it, but whatever.

139
00:08:10,432 --> 00:08:14,667
It supports archive formats, so if you
give it two tar files,

140
00:08:15,413 --> 00:08:22,263
if we then tar up our "a" file and
our "b" file into am a.tar and b.tar

141
00:08:23,206 --> 00:08:25,374
and then run diffoscope on those .tar files

142
00:08:26,197 --> 00:08:28,395
we get this kind of, like, hierarchy here.

143
00:08:28,742 --> 00:08:32,006
So it's saying that there are differencies
between these files,

144
00:08:32,513 --> 00:08:37,735
in the file list they have different time
stamps, because I made them

145
00:08:38,161 --> 00:08:39,535
at different times,

146
00:08:39,848 --> 00:08:42,575
and here are the contents, so we got
"foo" there and "bar" there.

147
00:08:43,296 --> 00:08:44,781
So we can see the difference between them.

148
00:08:45,566 --> 00:08:48,373
Well, I can, I don't know if you can,
you get the slide there.

149
00:08:49,311 --> 00:08:53,551
If we gzip these tar files and then run
diffoscope on those gzip things,

150
00:08:53,888 --> 00:08:59,230
it'll say "OK, what we've done is unpack it
first, and here's the metadata

151
00:08:59,622 --> 00:09:01,653
about the gzip process",

152
00:09:02,107 --> 00:09:05,941
and inside that are a.tar and b.tar
from the previous slides.

153
00:09:07,673 --> 00:09:09,085
And then the "a" file and the "b" file.

154
00:09:09,365 --> 00:09:15,303
So, it's really going two levels deep
into this tar.gz file.

155
00:09:16,162 --> 00:09:17,042
That's pretty cool.

156
00:09:17,291 --> 00:09:20,772
And it's completely recursive, I think
it will actually blow out after, I think,

157
00:09:20,993 --> 00:09:21,697
1000 levels.

158
00:09:23,119 --> 00:09:25,233
[light is turned down for the audience
to see the slides]

159
00:09:30,195 --> 00:09:32,065
I'll just bump back a bit, just in case.

160
00:09:35,203 --> 00:09:37,055
[Applause]

161
00:09:37,806 --> 00:09:38,662
Thank you.

162
00:09:39,907 --> 00:09:43,462
So that's the a and b files.

163
00:09:43,884 --> 00:09:48,077
We've tared them up and so I see
the hierarchy of foo and bar file layer.

164
00:09:48,472 --> 00:09:52,012
I've gziped them, so this is a gzip layer.

165
00:09:52,399 --> 00:09:54,661
Here's the .tar layer and then there's
the files themselves.

166
00:09:57,315 --> 00:09:59,252
This is from a real .deb from the archive.

167
00:10:00,637 --> 00:10:06,542
Inside this .deb, there's a data.tar.xz
and in that .xz file there's a data.tar

168
00:10:07,294 --> 00:10:11,081
and inside that .tar file, there's a file
called .aff and inside that

169
00:10:11,648 --> 00:10:13,892
there's a version string that is different.

170
00:10:14,174 --> 00:10:17,527
And that looks like a build date so we
probably know that if we went back

171
00:10:17,753 --> 00:10:22,748
to the source package, we could very
quickly work out,

172
00:10:22,922 --> 00:10:26,582
with get a very quick grep, work out
where this file is being generated from,

173
00:10:26,582 --> 00:10:31,536
the de_DE.aff file and then ???
probably quite obvious

174
00:10:32,285 --> 00:10:37,311
that it's using the current build time
and then we can just patch that, fix it etc.

175
00:10:38,362 --> 00:10:45,681
This is gone from two rather obscure
binary .debs all the way to the fix

176
00:10:46,040 --> 00:10:51,683
probably in about 5 minutes, and you can
probably send the patch in that time

177
00:10:52,098 --> 00:10:53,086
because it'd be quite quick.

178
00:10:53,860 --> 00:10:57,482
Without diffoscope here, without this sort
of recursive unpacking,

179
00:10:58,351 --> 00:11:03,380
you'd be just completely lost, you'd be
there with arx all day

180
00:11:03,762 --> 00:11:07,109
and working out which files are different
and trying to use xxd

181
00:11:07,859 --> 00:11:09,410
and this kind of nonsense.

182
00:11:10,612 --> 00:11:12,875
diffoscope's got some other things as well

183
00:11:13,277 --> 00:11:17,116
if you try to do reproducible packages
and things are varying just on

184
00:11:17,381 --> 00:11:22,408
the line ordering, we detect whether
a file differs only in the line ordering.

185
00:11:22,660 --> 00:11:26,178
So, here's file "a", "These lines are in
order".

186
00:11:27,155 --> 00:11:30,108
File "b" has "These order are in lines".

187
00:11:30,630 --> 00:11:34,864
It's very difficult to say, actually,
it's like one of these tongue twisters.

188
00:11:35,305 --> 00:11:38,862
Run diffoscope on these two and it says
it's got ordering differences only.

189
00:11:39,210 --> 00:11:41,295
That's interesting, so you probably need
to sort,

190
00:11:41,592 --> 00:11:45,076
you go all the way back to the source code,
work out very quickly,

191
00:11:45,389 --> 00:11:48,381
if you know it's just ordering differences
you just kind of know

192
00:11:48,672 --> 00:11:52,762
what the output's gonna be, you can
search for order in ???

193
00:11:53,166 --> 00:11:54,648
and you get the right files,

194
00:11:54,928 --> 00:11:57,803
I have sorted in sort in the right ???
place, BAM! send it patched of,

195
00:11:57,889 --> 00:11:59,280
everything is great.

196
00:11:59,280 --> 00:12:02,720
Oh, and send it to upstream as well
because you're good.

197
00:12:03,041 --> 00:12:04,707
It supports a lot more things.

198
00:12:05,509 --> 00:12:08,611
We've been showing the terminal
text output here.

199
00:12:10,978 --> 00:12:15,950
It's got a HTML output mode, which is
really useful in the hierarchal thing

200
00:12:16,139 --> 00:12:17,359
when it gets a bit more complicated.

201
00:12:19,397 --> 00:12:21,766
Instead of being laid on top of each other
like a unified diff,

202
00:12:22,312 --> 00:12:26,811
you get the diff on the left and the right
and you get sort of a nested

203
00:12:27,075 --> 00:12:32,372
thing inside with colors and lines and
you can link this and various things in it

204
00:12:32,728 --> 00:12:37,547
including bits of metadata here, other
bits here, what command you used.

205
00:12:38,951 --> 00:12:40,392
That's the HTML output.

206
00:12:40,659 --> 00:12:43,960
We also support a lot of file formats,
it's not just on text,

207
00:12:45,635 --> 00:12:48,958
it's about all of these, so let's quickly
run through some of them.

208
00:12:49,298 --> 00:12:54,503
You give it two Androip .apk files which
are kind of like zips, but magic.

209
00:12:55,163 --> 00:12:58,211
It'll know how to compare them.

210
00:12:58,570 --> 00:13:01,026
There's like a Manifest file that needs
decoding.

211
00:13:01,617 --> 00:13:03,761
It supports Berkeley DB databases,

212
00:13:04,098 --> 00:13:08,247
Word documents, that's a Word document
with "a" and that's a Word document with "b"

213
00:13:08,715 --> 00:13:10,359
and it'll correctly do that.

214
00:13:10,583 --> 00:13:14,311
If you run that through diff normally,
that ??? be a binaly mess,

215
00:13:14,932 --> 00:13:16,188
so completely useless.

216
00:13:17,503 --> 00:13:20,118
E-books, there's .epub, it also supports
.mobi.

217
00:13:20,563 --> 00:13:25,958
So if you give it two .epub files, it'll say
"They just differ in this date".

218
00:13:26,463 --> 00:13:27,350
Brilliant.

219
00:13:28,177 --> 00:13:30,557
Normally that will be completely useless
diff binary ???

220
00:13:30,794 --> 00:13:35,624
So you can be like ".epub date, ok", grep
the source code for that,

221
00:13:36,427 --> 00:13:38,350
make a patch really quickly.

222
00:13:39,594 --> 00:13:42,786
Mono binaries, Git repositories, why not?

223
00:13:43,693 --> 00:13:46,222
Gnumeric spreadsheets, ISO images.

224
00:13:46,454 --> 00:13:47,883
Oh yeah, ISO images is really cool.

225
00:13:48,359 --> 00:13:55,044
So, it'll basically unpack the ISO, then
inside that there might be a squashfs image

226
00:13:55,378 --> 00:14:01,549
then it'll completely go down to that and
work out any differences

227
00:14:01,746 --> 00:14:06,065
between the two contents in the ISO file,
including any metadata.

228
00:14:06,432 --> 00:14:10,607
This is on the squashfs metadata headers,
I think.

229
00:14:11,634 --> 00:14:19,251
But say inside that ISO, there was a file
that was a .PDF, and inside that .PDF was

230
00:14:19,572 --> 00:14:23,048
a ??? which varied,

231
00:14:23,285 --> 00:14:26,653
it will basically go all the way down
and say "yeah, it's actually here,

232
00:14:26,909 --> 00:14:28,446
in this ??? that the data differs."

233
00:14:28,866 --> 00:14:32,355
And that means you can just go again
all the way back to the source

234
00:14:32,646 --> 00:14:35,555
and say "ok, cool, we know how to fix
this quite quickly"

235
00:14:36,076 --> 00:14:39,600
And this is really valuable in getting
the recent Tails distribution reproducible

236
00:14:39,973 --> 00:14:43,387
so their ISOs are reproducible.

237
00:14:43,829 --> 00:14:46,873
If you build one and I build one, we get
the exact same one

238
00:14:47,241 --> 00:14:51,389
and that's kind of useful for something
like Tails where you would probably want to

239
00:14:51,828 --> 00:14:54,966
of all, there's a lot of projects that you
might want to compromise,

240
00:14:55,450 --> 00:14:58,792
you might want to go after that one,
because of the kind of people that are using it.

241
00:15:01,734 --> 00:15:10,009
We support comparing images, so this is
using ???

242
00:15:12,043 --> 00:15:13,714
and then just running that through diff.

243
00:15:16,092 --> 00:15:20,272
That is a Linux penguin and that is
something else,

244
00:15:20,627 --> 00:15:23,629
I can't remember now. Oh, FT.

245
00:15:24,819 --> 00:15:25,801
It supports images.

246
00:15:27,044 --> 00:15:33,009
It supports JSON and pretty print,
so if you give it two JSON files

247
00:15:33,485 --> 00:15:36,657
one with key/value… it'll do a nice
diff of them.

248
00:15:38,042 --> 00:15:43,432
It will pretty print it first, before
doing the diff, so it'll actually give you

249
00:15:43,634 --> 00:15:46,236
something clean, otherwise I don't know
if you've ever diffed

250
00:15:46,978 --> 00:15:50,344
two very long JSON lines, if they differ
in the middle, you just get

251
00:15:50,525 --> 00:15:54,737
a huge long unified diff, but here it's
like "oh, just ??? things have changed"

252
00:15:58,875 --> 00:16:04,052
OpenDocument text formats,
Ogg audio files, because why not.

253
00:16:05,148 --> 00:16:08,251
tcpdump capture files, that's actually
quite useful.

254
00:16:09,019 --> 00:16:17,540
PDFs. That PDF says "Hello World" and
this PDF says "Hello sick sad world",

255
00:16:17,995 --> 00:16:23,356
I don't know why, that particulary text
in the demo.

256
00:16:23,852 --> 00:16:27,058
Again, run that through normal diff
program… garbage.

257
00:16:28,212 --> 00:16:34,074
XML documents. Again, it'll pretty print
them so it's nice, actually nice do read.

258
00:16:36,117 --> 00:16:41,809
If you want to get started on diffoscope,
the very easiest and quickest way to do is

259
00:16:42,212 --> 00:16:47,678
fire up a web browser, try.diffoscope.org,
select your files, press Compare

260
00:16:48,470 --> 00:16:54,883
and it'll upload them and run diffoscope
with all the support for all the file formats

261
00:16:55,226 --> 00:16:59,096
in the cloud for you and give you a nice
HTML page that you can then link to people

262
00:16:59,423 --> 00:17:01,107
So that's the very quickest way to get
started.

263
00:17:02,360 --> 00:17:06,884
The next quickest way is to install
trydiffoscope and then you run that

264
00:17:07,165 --> 00:17:09,751
on two files and it'll basically do
the same thing,

265
00:17:10,018 --> 00:17:12,312
run it in the same cloud service as
trydiffoscope

266
00:17:12,877 --> 00:17:16,672
but it'll give you the result on the
command line or

267
00:17:16,981 --> 00:17:22,010
if you pass the --webbrowser option, it will
give you an URL or load your webbrowser,

268
00:17:22,228 --> 00:17:24,951
I can't remember exactly which, with
the same results.

269
00:17:25,122 --> 00:17:29,574
This is 1kB of Python, nothing basically.

270
00:17:31,226 --> 00:17:33,120
That's the next easiest way.

271
00:17:34,262 --> 00:17:36,622
But you can then install diffoscope itself
on your own machine.

272
00:17:37,631 --> 00:17:42,824
I recommend not installing Recommends
because all of those file formats

273
00:17:43,208 --> 00:17:46,560
might drag in extra things about
the whole of TeX,

274
00:17:46,820 --> 00:17:52,178
I think the whole of OpenOffice, whole
of Mono, whole of Java…

275
00:17:57,263 --> 00:17:58,403
Android, yeah, quite big.

276
00:18:01,941 --> 00:18:03,489
I think there's another big one I can't
think of.

277
00:18:04,554 --> 00:18:11,185
They're all optional, and they all say
"By the way, I support TeX documents

278
00:18:12,046 --> 00:18:13,281
or whatever, Mono, whatever.

279
00:18:13,740 --> 00:18:18,954
But you need to install this package and
then you get full pretty printed support",

280
00:18:19,846 --> 00:18:21,433
And it'll tell you that when it's missing.

281
00:18:21,791 --> 00:18:25,168
So, if you just start with
--install-recommends disabled,

282
00:18:26,427 --> 00:18:29,107
right on your file, if it says
"please install this package, you can then

283
00:18:29,335 --> 00:18:31,239
install them as you go along, as you want"

284
00:18:31,722 --> 00:18:34,319
rather than installing everything.

285
00:18:34,630 --> 00:18:38,333
And then you just pass ??? files
and then works as before

286
00:18:41,978 --> 00:18:45,869
How you can you improve all your own
quality assurance and debian packaging

287
00:18:45,959 --> 00:18:46,713
with different scope

288
00:18:47,582 --> 00:18:50,974
The biggest value here is not
necessary for Reproducible Builds

289
00:18:51,771 --> 00:18:56,406
It's for basically just seeing where you
do want to have a diff or expecting a diff

290
00:18:57,078 --> 00:19:00,368
and you are expecting a particularly type
of diff in a particularly way

291
00:19:00,903 --> 00:19:02,307
you can basically see those changes

292
00:19:03,539 --> 00:19:12,151
And if you build two debs normally and
... i'll try to demo in a second

293
00:19:12,403 --> 00:19:16,239
You build a .deb with a patch applied and 
then build a .deb with the patch applied

294
00:19:16,792 --> 00:19:19,791
you can ??? run a diff on the source package

295
00:19:20,742 --> 00:19:24,455
But that's not very useful because the
binaries are going to end in the

296
00:19:24,695 --> 00:19:30,698
people machines. But if you run a diff on
the binary itself, did my change actually

297
00:19:31,150 --> 00:19:33,205
hit the binary? I think really ...
No..

298
00:19:36,118 --> 00:19:39,093
I just run through a very live demo of
course, so it's gonna fail ...

299
00:20:03,706 --> 00:20:07,376
Checkout some .... We'll get this 
libnetx-java

300
00:20:11,041 --> 00:20:12,160
We just build that once

301
00:20:16,188 --> 00:20:19,258
Lets say we are on security team and

302
00:20:19,475 --> 00:20:22,701
want to apply a patch, and we want to be
really sure because we are to push it out

303
00:20:22,888 --> 00:20:24,044
to all our users

304
00:20:25,046 --> 00:20:28,612
First we will make a changelog entry

305
00:20:38,445 --> 00:20:39,284
Closing a bug

306
00:20:48,105 --> 00:20:54,949
Find some .java file to change

307
00:20:55,688 --> 00:20:56,798
Let's pretend we have a real patch

308
00:21:06,374 --> 00:21:10,650
Let's replace that equals equals,
say that was the fix

309
00:21:14,033 --> 00:21:15,512
So that's the patch from upstream

310
00:21:15,884 --> 00:21:16,966
Upstream blast patch

311
00:21:23,505 --> 00:21:26,637
When we build this what we wanna see is
just that change in the file

312
00:21:27,141 --> 00:21:32,116
we wanna see any nonsense changes of 
extended dump but we also definitely want

313
00:21:32,293 --> 00:21:37,129
to see that change, cause if our binary as
for security reasons don't have that change

314
00:21:37,129 --> 00:21:42,270
then we aren't fixing people machines,
they will issue a DSA ??? installed ???

315
00:21:44,685 --> 00:21:48,766
And you should do proper testing as well
at multiple levels

316
00:21:52,763 --> 00:21:53,799
I will build that again

317
00:22:23,976 --> 00:22:29,717
So we wanna diff the original one 0 5,

318
00:22:30,432 --> 00:22:36,212
We wanna diff that one with a fake 
security one

319
00:22:37,608 --> 00:22:43,481
You see on the progress bar 100%
1- there are diferences (there should be

320
00:22:43,681 --> 00:22:46,304
diferences)
Lets see what that diferences are

321
00:22:48,418 --> 00:22:51,828
in our web browser, its a nice html output

322
00:23:01,180 --> 00:23:03,888
Let have a look.
Are we seeing what we wanna see?

323
00:23:07,147 --> 00:23:11,151
There are some chances in the data tar, we
kind of expect that

324
00:23:14,447 --> 00:23:18,389
What's changed in our control file?
Well the version changed,we wanted that

325
00:23:18,565 --> 00:23:19,656
to change. Perfect

326
00:23:20,535 --> 00:23:24,294
And its changed to ???
That's what we wanna see

327
00:23:24,744 --> 00:23:28,370
No other changes here so there was no 
weird control or in magic going on

328
00:23:32,297 --> 00:23:38,421
In our data .tar the color of the timestamp
changes, we will ignore those for now

329
00:23:40,996 --> 00:23:44,944
The changelog has changed, well I hope so
because I have changed that entry

330
00:23:48,820 --> 00:23:51,793
Here is where we going to start seeing
We are going to see the changing in the

331
00:23:52,016 --> 00:23:59,455
jar file which is the java class, java
compile archive format

332
00:24:00,442 --> 00:24:05,931
We are seeing some meaningless timestamp
changes but we can ignore those

333
00:24:06,973 --> 00:24:08,923
lets pretend because its just 
metadata maybe

334
00:24:16,429 --> 00:24:24,131
Ok part of a class, so if you can see here
it's basically a de-compilation of the

335
00:24:24,633 --> 00:24:31,500
.java file itself and it's basically saying
"oh I use to say if now and if not now"

336
00:24:31,796 --> 00:24:35,567
So these are the actual Java
bytecode instructions and what's really

337
00:24:35,965 --> 00:24:39,241
And what is really ??? here
its that nothing else has changed

338
00:24:39,627 --> 00:24:44,717
We were just expecting that change between
the two opcodes, of if now elseif not not now

339
00:24:45,554 --> 00:24:49,557
which is good cause its like it hasn't made
any code changes but also crucial we can

340
00:24:49,725 --> 00:24:52,076
see that it has actually made a change
to the code.

341
00:24:55,060 --> 00:24:58,072
For example its wasn't use some cached
version or something like that

342
00:24:58,338 --> 00:24:59,505
This is really useful

343
00:25:00,326 --> 00:25:05,038
And just running a naive diff wouldn't
give that of course, because it would just

344
00:25:05,223 --> 00:25:08,341
come with binary garbage
And just seeing the diff had changed again

345
00:25:08,627 --> 00:25:12,604
??? be told you anything, because all of the
change would have changed as well

346
00:25:12,802 --> 00:25:15,886
So its like well yes it's diferent

347
00:25:16,028 --> 00:25:19,161
The meaningful change there it's
what actually fixes the flaw

348
00:25:19,597 --> 00:25:21,020
??? but we know it's there

349
00:25:22,945 --> 00:25:27,448
That's kind of ??? 
Shifting this deb out I'll be quite

350
00:25:27,687 --> 00:25:30,004
confident, that this seemed like the
actual bug

351
00:25:31,151 --> 00:25:34,721
I've been quite confident pushing that out
because it's very minimal amount of changes

352
00:25:35,218 --> 00:25:36,750
you wanna do that for security reasons

353
00:25:37,285 --> 00:25:40,111
So this was the live demo

354
00:25:43,038 --> 00:25:48,108
The other one is seeing no changes
at all, so you can build once

355
00:25:48,108 --> 00:25:49,894
if you build a reproducible

356
00:25:50,491 --> 00:25:54,753
You can build once change your compiler
or change some other part of your toolchain

357
00:25:55,982 --> 00:26:02,267
Build it again and if you got the exact same
results, well great, that's want you intended

358
00:26:02,534 --> 00:26:04,595
You wanna see no changes when you change
some part of it

359
00:26:08,127 --> 00:26:11,928
And that is really useful, if there were
changes diffoscope will highlight them

360
00:26:12,271 --> 00:26:15,993
and show exactly why they had changed,
maybe some compile authorizations,

361
00:26:16,393 --> 00:26:17,565
maybe some other things as well

362
00:26:19,056 --> 00:26:22,603
So you can use it in both ways, when you
expect changes and when you don't expect

363
00:26:22,789 --> 00:26:26,926
changes, and if those match the expectations
diffoscope will tell you exactly why

364
00:26:29,922 --> 00:26:34,355
It's all ??? when other companies
are doing security releases

365
00:26:35,111 --> 00:26:41,184
naming no names whatsoever,
but they like to release patches as you

366
00:26:41,697 --> 00:26:44,618
know just a new firmware for your router

367
00:26:46,674 --> 00:26:50,629
Very large file system images,
you basically have no ideia what changed

368
00:26:51,034 --> 00:26:55,037
between these two files, again you run
through diff completely useless

369
00:26:55,419 --> 00:26:59,496
You can start to unpack them with
squashfs and blah blah blah

370
00:27:01,143 --> 00:27:05,753
But they're probably sort of concatenated
cpio archives, so that's nonsense

371
00:27:07,223 --> 00:27:11,913
But diffoscope would just chew you those
and give you actually what the diferences

372
00:27:11,913 --> 00:27:15,197
is between these two files, and say
they changed this, they've removed or

373
00:27:15,596 --> 00:27:19,260
added some GPL license code or something
kind of interesting

374
00:27:24,293 --> 00:27:31,212
So its very useful for diffing those kind
binary blobs that come from various people

375
00:27:33,013 --> 00:27:36,983
So the current state of diffoscope,
the development is up and down

376
00:27:41,148 --> 00:27:51,343
It started around May 2014 something like that
A bunch of work here, that's is idle I think

377
00:27:55,239 --> 00:27:56,841
These are just for DebConfs basically

378
00:28:09,157 --> 00:28:12,343
Anyway it's going up and down its kind
of interesting

379
00:28:14,939 --> 00:28:19,296
??? a lot of Reproducible Builds projects
of course, so every time we do a build

380
00:28:19,621 --> 00:28:25,064
on the ??? Reproducible Builds or
testing framework if we run diffoscope

381
00:28:25,303 --> 00:28:29,834
on the result, if it's reproducible it
just says , hey the file is the same

382
00:28:31,208 --> 00:28:36,767
But if not, we publish the diffoscopes of
all your packages that are unreproducible

383
00:28:37,092 --> 00:28:40,870
just you can just go there and be like
whats the diference between these two things

384
00:28:53,762 --> 00:29:02,115
I invested a lot of work optimizing
diffoscope, ??? rather perverse end square

385
00:29:02,465 --> 00:29:07,556
loops inside it. So i manage to cut down
some of the time here, cut down here

386
00:29:11,063 --> 00:29:14,012
That's been quite a few performances and 
enhancements over the past ...

387
00:29:16,395 --> 00:29:21,240
these are the git tags , this is version 80
and this is version 50 I just run the same

388
00:29:22,147 --> 00:29:23,363
benchmark across them all

389
00:29:24,705 --> 00:29:35,180
So they shows when I have introduced some
rather stupid code, embarrassing , but whatever

390
00:29:35,703 --> 00:29:36,424
???

391
00:29:37,482 --> 00:29:40,522
There's work been done right now,
on parallel processing, there's been

392
00:29:40,923 --> 00:29:46,344
quite a few attempts before, but adding it
it's kind of interesting and difficult

393
00:29:47,033 --> 00:29:51,898
Luckily we have an outreach student
Liliana, is she in the room? Is she hiding?

394
00:29:53,069 --> 00:29:57,225
She's here and she's been talking tomorrow
about her work on paralel processing in

395
00:29:57,520 --> 00:30:02,162
diffoscope and that will be amazing because
a lot of it is IO bound or waiting for Xtel

396
00:30:02,388 --> 00:30:06,635
processors with multiple cpu machines,
you mind as well just play well

397
00:30:07,012 --> 00:30:11,631
while as I stand waiting for the result
for a pdf to be unpacked I maybe as well

398
00:30:11,913 --> 00:30:16,859
be running on another cpu, I think we are
going to see some real performance wins

399
00:30:17,512 --> 00:30:22,810
as we do that paralell processing merge and
working and ???

400
00:30:24,189 --> 00:30:29,544
You can check out our website diffoscope.org
recently migrated to Salsa .... yeeaahhh

401
00:30:33,375 --> 00:30:37,771
And everything that's reproducible is now
on Salsa, it's kind of cool

402
00:30:38,732 --> 00:30:42,450
That's quite recent...
???

403
00:30:44,620 --> 00:30:45,876
Thank you very muck, danke shcön

404
00:30:46,560 --> 00:30:48,733
You got any questions?
About diffoscope?

405
00:30:51,659 --> 00:30:53,558
Thank you very much !

406
00:30:53,558 --> 00:30:57,761
[Applause]

407
00:30:59,888 --> 00:31:02,954
Q: A buzzword question, can you diff containers
image formats?

408
00:31:04,943 --> 00:31:14,617
A: Depend which ones. So if they are just
directories, then yes, because is just a directory

409
00:31:15,139 --> 00:31:17,224
Do you have particullary in mind? Like Docker?

410
00:31:19,068 --> 00:31:25,487
Yes, there's Docker and then there's
OCI, I believe is the standard one

411
00:31:26,669 --> 00:31:30,506
And that could make a buzzword complaint

412
00:31:31,286 --> 00:31:33,028
Ah ok we were all about buzzwords

413
00:31:34,334 --> 00:31:37,411
Probable diffoscope block change as well

414
00:31:38,249 --> 00:31:42,059
And then run diffoscope on connectors and
see the difference between updates of your

415
00:31:42,059 --> 00:31:43,395
container images

416
00:31:43,620 --> 00:31:46,219
BAM ... solved
Where do I invest?

417
00:31:48,231 --> 00:31:56,645
I wasn't aware that OCI ... that's is how it's
called? No it doesn't support that right now

418
00:31:58,347 --> 00:32:02,025
But it wouldn't be too difficult, presuming
there are tools to unpack it and as soon

419
00:32:02,297 --> 00:32:07,761
we have a tool to unpack it, it can then 
just go to that, there is an open wishlist

420
00:32:08,177 --> 00:32:15,402
bug in the BTS for docker containers to the 
point were I think it would be really

421
00:32:15,668 --> 00:32:19,338
nice if you could just give it, say, two 
images names or whatever the noun is

422
00:32:19,835 --> 00:32:24,083
So you can say "please diff these two
docker images that are available" and

423
00:32:24,274 --> 00:32:28,753
it can look at your local thing and do 
a diff on them, currently it's not

424
00:32:29,008 --> 00:32:31,077
supported, but there is an open wishlist
bug.

425
00:32:32,345 --> 00:32:36,860
Q: Shouldn't any company that releases
binaries, be interested in supporting

426
00:32:37,183 --> 00:32:38,544
diffoscope and using it?

427
00:32:51,541 --> 00:32:58,413
A1: Basically when companies release binaries they are not interested in users seeing diferences...

428
00:33:01,874 --> 00:33:10,299
A2: Yes, I'm surprised that actually the
docker bug was only opened two months ago

429
00:33:10,776 --> 00:33:17,144
and hasn't been more interest on diffing
container images, but if you like to open

430
00:33:17,561 --> 00:33:24,460
one for OCI that will be very appreciated,
and we can get on to that, that would be

431
00:33:24,677 --> 00:33:25,573
great.

432
00:33:30,038 --> 00:33:35,465
I was looking the page for OCI, it says
it's based on Docker basically, so

433
00:33:35,655 --> 00:33:40,500
once you get OCI for free, you would
sort it out for Docker, if you're lucky

434
00:33:48,166 --> 00:33:51,646
The OCI image formaters, they wrote out
on Docker images

435
00:33:55,429 --> 00:34:00,232
Ok we will sort that out, and it seems like
we're using a docker more and more

436
00:34:00,279 --> 00:34:01,451
on debian

437
00:34:07,484 --> 00:34:09,216
Any other questions?

438
00:34:20,886 --> 00:34:29,297
Q: Out of curiosity, which ??? are you using
inside? Are you using some bio-informatics

439
00:34:30,447 --> 00:34:33,332
algorithm to diff trees efficiently?

440
00:34:34,200 --> 00:34:46,781
A: No it's really naive, all it does is run
normal diff, the normal diff tools, but

441
00:34:47,126 --> 00:34:59,242
it will try to identify files and unpack
first, so use the file(1) utility identifier

442
00:34:59,716 --> 00:35:06,547
thing that says "it's a PDF", and try to
unpack it first, he doesn't do any clever

443
00:35:07,415 --> 00:35:12,056
matching. The clever matching that he does
do is fuzzy matching as well, so if just

444
00:35:12,293 --> 00:35:18,567
rename a directory between two inside a 
container, he will say , yeah there a

445
00:35:18,812 --> 00:35:23,981
massive fuzzy match between this
two files, and things like that. So that's

446
00:35:24,241 --> 00:35:31,110
kind of useful, but apart from that clever, 
which is kind of what you want, because

447
00:35:31,292 --> 00:35:34,308
if it's too clever it would start to be a little
opaque ...

448
00:35:37,749 --> 00:35:40,046
I personally like dumb tools.

449
00:35:43,916 --> 00:35:51,411
Q: So one question to you is whether,
if you wanna do a release to stable or

450
00:35:51,565 --> 00:35:58,973
something like that, you can ask for the
debdiff, I'm wandering if anyone

451
00:35:59,174 --> 00:36:03,914
I mean I remember doing that myself
I've been submitting diffoscope output

452
00:36:04,119 --> 00:36:09,516
as well, because is just more readable and
useful. so I'm not sure if anyone have any

453
00:36:09,692 --> 00:36:12,741
objection to people asking for those.

454
00:36:22,179 --> 00:36:24,752
I'll propose that to the Release Team
see what they say

455
00:36:26,024 --> 00:36:28,950
Thank you very much, 
is there any other questions?

456
00:36:32,634 --> 00:36:36,787
No further questions? Then lets thanks
Chris again !

457
00:36:37,137 --> 00:36:41,940
[Applause]
