1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
|
(post
:title "GNU Guix in an HPC environment"
:date (string->date* "2015-04-17 00:00")
:tags '("gnu"
"planet-fsfe-en"
"free software"
"guix"
"bioinformatics"
"system administration"
"packaging"
"cluster")
(p [I spend my daytime hours as a system administrator at a research
institute in a heterogeneous computing environment. We have two
big compute clusters (one on CentOS the other on Ubuntu) with
about 100 nodes each and dozens of custom GNU/Linux workstations.
A common task for me is to ensure the users can run their
bioinformatics software, both on their workstation and on the
clusters. Only few bioinformatics tools and libraries are
popular enough to have been packaged for CentOS or Ubuntu, so
usually some work has to be done to build the applications and
all of their dependencies for the target platforms.])
(h2 [How to waste time building and deploying software])
(p [In theory, compiling software is not a very difficult thing to
do. Once all development headers have been installed on the
build host, compilation is usually a matter of configuring the
build with a configure script and running GNU make with various
flags (this is an assumption which is violated by bioinformatics
software on a regular basis, but let’s not get into this now).
However, there are practical problems that become painfully
obvious in a shared environment with a large number of users.])
(h3 [Naive compilation])
(p [Compiling software directly on the target machine is an option
only in the most trivial cases. With more complicated build
systems or complicated build-time dependencies there is a strong
incentive for system administrators to do the hard work of
setting up a suitable build environment for a particular piece of
software only once. Most people would agree that package
management is a great step up from naive compilation, as the
build steps are formalised in some sort of recipe that can be
executed by build tools in a reproducible manner. Updates to
software only require tweaks to these recipes. Package
management is a good thing.])
(h3 [System-dependence])
(p [Non-trivial software that was built and dynamically linked on one
machine with a particular set of libraries and header files at
particular versions can only really work on a system with the
very same libraries at compatible versions in place. Established
package managers allow packagers to specify hard dependencies and
version ranges, but the binaries that are produced on the build
host will only work under the constraints imposed on them at
build time. To support an environment in which software must run
on, say, both CentOS 6.5 and CentOS 7.1, the packages must be
built in both environments and binaries for both targets have to
be provided.])
(p [There are ways to emulate a different build environment
(e.g. Fedora’s ,(code [mockbuild])), but we cannot get around the
fact that dynamically linked software built for one kind of
system will only ever work on that very kind of system. At
runtime we can change what libraries will be dynamically loaded,
but this is a hack that pushes the problem from package
maintainers to users. Running software with ,(code
[LD_LIBRARY_PATH]) set is not a solution, nor is static linking,
the equivalent to copying chunks of libraries at build time.])
(h3 [Version conflicts])
(p [Libraries and applications that come pre-installed or
pre-packaged with the system may not be the versions a user
claims to need. Say, a user wants the latest version of GCC to
compile code using new language features specified in C++11
(e.g. anonymous functions). Full support for C++11 arrived in
GCC 4.8.1, yet on CentOS 6.5 only version 4.4.7 is available
through the repositories. The system administrator may not
necessarily be able to upgrade GCC system-wide. Or maybe other
users on a shared system do need version 4.4.7 to be available
(e.g. for bug-compatibility). There is no easy way to satisfy
all users, so a system administrator might give up and let users
build their own software in their home directories instead of
solving the problem.])
(p [However, compiling GCC is a daunting task for a user and they
really shouldn’t have to do this at all. We already established
that package management is a good thing; why should we deny users
the benefits of package management? Traditional package
management techniques are ill-suited to the task of installing
multiple versions of applications or libraries into independent
prefixes. RPM, for example, allows users to maintain a local,
independent package database, but ,(code [yum]) won’t work with
multiple package databases. Additionally, only ,(em [one])
package database can be used at once, so a user would have to
re-install system libraries into the local package database to
satisfy dependencies. As a result, users lose the important
feature of automatic dependency resolution.])
(h3 [Interoperability])
(p [A system administrator who decides to package software as
relocatable RPMs, to install the applications to custom prefixes
and to maintain a separate repository has nothing to show for
when a user asks to have the packaged software installed on an
Ubuntu workstation. There are ways to convert RPMs to DEB
packages (with varying degrees of success), but it seems silly to
have to convert or rebuild stuff repeatedly when the software,
its dependencies and its mode of deployment really didn’t change
at all.])
(p [What happens when a Slackware user comes along next? Or someone
using Arch Linux? Sure, as a system administrator you could
refuse to support any system other than CentOS 7.1, users be
damned. Traditionally, it seems that system administrators
default to this style for convenience and/or practical reasons,
but I consider this unhelpful and even somewhat oppressive.])
(h2 [Functional package management with GNU Guix])
(p [Luckily, I’m not the only person to consider traditional
packaging methods inadequate for a number of valid purposes.
There are different projects aiming to improve and simplify
software deployment and management, one of which I will focus on
in this article. As a functional programmer, Scheme aficionado
and free software enthusiast I was intrigued to learn about ,(ref
"https://www.gnu.org/software/guix/" "GNU Guix"), a functional
package manager written in ,(ref
"https://www.gnu.org/software/guile/" "Guile Scheme"), the
designated extension language for the ,(ref
"https://www.gnu.org/" "GNU system").])
(p [In purely functional programming languages a function will
produce the very same output when called repeatedly with the same
input values. This allows for interesting optimisation, but most
importantly it makes it ,(em [possible]) and in some cases even
,(em [easy]) to reason about the behaviour of a function. It is
independent from global state, has no side effects, and its
outputs can be cached as they are certain not to change as long
as the inputs stay the same.])
(p [Functional package management lifts this concept to the realm of
software building and deployment. Global state in a system
equates to system-wide installations of software, libraries and
development headers. Side effects are changes to the global
environment or global system paths such as ,(code [/usr/bin/]).
To reject global state means to reject the common file system
hierarchy for software deployment and to use a minimal ,(code
[chroot]) for building software. The introduction of the Guix
manual describes the approach as follows:])
(blockquote
(p [The term “functional” refers to a specific package management
discipline. In Guix, the package build and installation process
is seen as a function, in the mathematical sense. That function
takes inputs, such as build scripts, a compiler, and libraries,
and returns an installed package. As a pure function, its
result depends solely on its inputs—for instance, it cannot
refer to software or scripts that were not explicitly passed as
inputs. A build function always produces the same result when
passed a given set of inputs. It cannot alter the system’s
environment in any way; for instance, it cannot create, modify,
or delete files outside of its build and installation
directories. This is achieved by running build processes in
isolated environments (or “containers”), where only their
explicit inputs are visible.])
(p [The result of package build functions is “cached” in the file
system, in a special directory called “the store”. Each package
is installed in a directory of its own, in the store—by default
under ‘/gnu/store’. The directory name contains a hash of all
the inputs used to build that package; thus, changing an input
yields a different directory name.]))
(p [The following diagram (taken from the ,(ref
"https://www.gnu.org/software/guix/guix-fosdem-20150131.pdf"
"slides for a talk by Ludovic Courtès")) illustrates how the
build daemon handles the package build processes requested by a
client via remote procedure calls:])
(wide-img "2015/guix-build.png"
"Software is built by the Guix daemon in isolation")
(h3 [Isolated, yet shared])
(p [Note that the package outputs are still dynamically linked.
Libraries are referenced in the binaries with their full store
paths using the runpath feature. These package outputs are no
self-contained, monolithic application directories as you might
know them from MacOS.])
(p [Any built software is cached in the store which is shared by all
users system-wide. However, by default the software in the store
has no effect whatsoever on the users’ environments. Building
software and have the results stored in ,(code [/gnu/store]) does
not alter any global state; no files pollute ,(code [/usr/bin/])
or ,(code [/usr/lib/]). Any effects are restricted to the
package’s single output directory inside the ,(code
[/gnu/store]).])
(p [Guix provides per-user profiles to map software from the store
into a user environment. The store provides deduplication as it
serves as a cache for packages that have already been built. A
profile is little more than a “forest” of symbolic links to items
in the store. The union of links to the outputs of all software
packages the user requested makes up the user’s profile. By
adding another layer of symbolic link indirection, Guix allows
users to seamlessly switch among different generations of the
same profile, going back in time.])
(p [Each user profile is completely isolated from one another, making
it possible for different users to have different versions of GCC
installed. Even one and the same user could have multiple
profiles with different versions of GCC and switch between them
as needed.])
(p [Guix takes the functional packaging method seriously, so except
for the running kernel and the exposed machine hardware there are
virtually no dependencies on global state (i.e. system libraries
or headers). This also means that the Guix store is populated
with the complete dependency tree, down to the kernel headers and
the C library. As a result, software in the Guix store can run
on very different GNU/Linux distributions; a shared Guix store
allows me to use the very same software on my Fedora workstation,
as well as on the Ubuntu cluster, and on the CentOS 6.5 cluster.])
(p [This means that software only has to be packaged up once. Since
package recipes are written in a very declarative domain-specific
language on top of Scheme, packaging is surprisingly simple (and
to this Schemer is rather enjoyable).])
(h3 [User freedom])
(p [Guix liberates users from the software deployment decisions of
their system administrators by giving them the power to build
software into an isolated directory in the store using simple
package recipes. Administrators only need to configure and run
the Guix daemon, the core piece running as root. The daemon
listens to requests issued by the Guix command line tool, which
can be run by users without root permissions. The command line
tool allows users to manage their profiles, switch generations,
build and install software through the Guix daemon. The daemon
takes care of the store, of evaluating the build expressions and
“caching” build results, and it updates the forest of symbolic
links to update profile state.])
(p [Users are finally free to conveniently manage their own software,
something they could previously only do in a crude manner by
compiling manually.])
(h2 [Using a shared Guix store])
(p [Guix is not designed to be run in a centralised manner. A Guix
daemon is supposed to run on each system as root and it listens
to RPCs from local users only. In an environment with multiple
clusters and multiple workstations this approach requires
considerable effort to make it work correctly and securely.])
(wide-img "2015/guix-shared.svg"
"Sharing Guix store and profiles")
(p [Instead we opted to run the Guix daemon on a single dedicated
server, writing profile data and store items onto an NFS share.
The cluster nodes and workstations mount this share read-only.
Although this means that users lose the ability to manage their
profiles directly on their workstations and on the cluster nodes
(because they have no local installation of the Guix client or
the Guix daemon, and because they lack write access to the shared
store), their software profiles are now available wherever they
are. To manage their profiles, users would log on to the Guix
server where they can install software into their profiles, roll
back to previous versions or send other queries to the Guix
daemon. (At some point I think it would make sense to enhance
Guix such that RPCs can be made over SSH, so that explicit
logging on to a management machine is no longer necessary.)])
(h2 [Guix as a platform for scientific software])
(p [Since winter 2014 I have been packaging software for GNU Guix,
which meanwhile has accumulated quite a few common and obscure
,(ref
"http://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/bioinformatics.scm"
"bioinformatics tools and libraries"). A list of software
(updated daily) available through Guix is ,(ref
"https://www.gnu.org/software/guix/package-list.html" "available
here"). We also have common Python modules for scientific
computing, as well as programming languages such as R and Julia.])
(p [I think GNU Guix is a great platform for scientific software in
heterogeneous computing environments. The Guix project follows
the ,(ref
"https://gnu.org/distros/free-system-distribution-guidelines.html"
"Free System Distribution Guidelines"), which mean that free
software is welcome upstream. For software that imposes
additional usage or distribution restrictions (such as when the
original Artistic license is used instead of the Clarified
Artistic license, or when commercial use is prohibited by the
license) Guix allows the use of out-of-tree package modules
through the ,(code [GUIX_PACKAGE_PATH]) variable. As Guix
packages are just Scheme variables in Scheme modules, it is
trivial to extend the official GNU Guix distribution with package
modules by simply setting the ,(code [GUIX_PACKAGE_PATH]).])
(p [If you want to learn more about GNU Guix I recommend taking a
look at the excellent ,(ref "https://www.gnu.org/software/guix/"
"GNU Guix project page"). Feel free to contact me if you want to
learn more about packaging scientific software for Guix. It is
not difficult and we all can benefit from joining efforts in
adopting this usable, dependable, hackable, and liberating
platform for scientific computing with free software.])
(p [The Guix community is very friendly, supportive, responsive and
welcoming. I encourage you to visit the project’s ,(ref
"https://webchat.freenode.net?channels=#guix" "IRC channel #guix
on Freenode"), where I go by the handle “rekado”.]))
|