Note
This technote is not yet published.
To to better support agility and management of third-party dependencies, reduce conflicts when compiling the stack on new and older systems, simplify distribution releases and installation of builds, we propose switching to conda-forge for managing third-party dependencies and adopting the conda compiler, the adoption of several new conda recipes, and some small changes to eups and the lsst-build system.
1 Switching to conda-forge for third parties and compilers¶
We need to switch to conda-forge for third parties. There’s multiple reasons for this, including these significant ones:
- New system libraries on newer versions of linux come with different symbols (e.g. Ubuntu 19.10)
- Needing a pyarrow from conda-forge, which also wants a boost from conda-forge
- conda default channel lags releases from conda-forge; no community mechanism for contributing to the default to update these - these are for the Anaconda distribution
- GCC 5.1 ABI changes
1.1 Switching Compilers¶
Switching to conda-forge dependencies means an end to devtoolset compilers and system dependencies for centos7 deployments, due to ABI breakage - centos7+devtoolset requires the pre-5.1 GCC ABI. So the options, for centos7 for example, are to either require a user to install system dependencies and a compiler themselves or get them from conda-forge, or through other non-standard channels. Retrieving those dependencies from conda-forge would be the most user-friendly way.
Matthew Becker merged conda compiler support into sconsUtils, so using conda compilers should work just fine, although they are not battle-tested. Testing with conda compilers on linux has occurred since July, and things tend to work just fine with that. Some issues with Mac compilers have occured in the past but are believe to be resolved now. Further testing will occur before this proposal is accepted.
Currently, if a user is using a newer system with the native compilers, or a newer compilers on an older system, there may still be issues around using those compilers even with the conda-forge third parties. In any case - conda compiler or newer compiler on top of conda - additional testing will still be required to ensure that builds work in all cases, including when compilers newer than those of conda are installed on the system outside conda.
2 Installing the stack and environment¶
eups and conda are still underpinning the use cases for both developers
and users, how those environments are installed and managed is
different. Importantly, in both cases, we rely on activation scripts
(scripts in activate.d
) in both cases to properly configure the terminal
when a conda environment is activated. In both cases, we also will have
scipipe_conda_env
transformed into an eups package, which base
will require.
2.1 End users consume releases with Stackvana¶
# conda already installed and conda-forge already configured
$ conda create -n lsst-20.1.0 stackvana=20.1.0
# lsst_distrib already activated
(lsst-20.1)$ eups list
Matthew Becker has created the
`stackvana
<https://github.com/beckermr/stackvana>`__ ecosystem, a
collection of conda recipes, for building and distributing the stack.
This ecosystem is oriented around a modified eups distrib install
workflow that consumes stack releases, ideally within the
conda-forge CI system. Users get an extremely simplified
installation, as the stack is distributed similarly to other conda-forge
package. The target for this use case, at least within the context of
conda-forge, is really for releases only, which probably does not
include weekly or daily builds, as these are not LSST releases, and
conda-forge CI resources are relatively precious. It’s important to note
that conda-forge states that it is not suitable as a target for rapid
development. As is stated in their orgnazation guidelines:
Publishing a package to conda-forge signals it is suitable for users not involved with development. However, publishing does not always happen error-free. Multiple commits are acceptable when debugging issues with the release process itself.
As such, it would probably be prudent to discuss release cadence of the stack to meet the needs of the users consuming stackvana installs, or rely on another channel [1].
2.1.1 Details¶
Note: Stackvana implementation details are subject to change. In particular, it’s conceivable that Stackvana may utilize eups for the build but rely on copying files to$PREFIX
during the build and not package eups.
The first package in the stackvana ecosystem, stackvana-core
,
installs conda third parties (with semantic pinning, rather than exact
build pinning as scipipe_conda_env
does currently), as well as eups
(eups db and setup script to activate.d
), sconsUtils, and eups remap
files for the third parties, and the conda compilers - everything needed
to build the stack. Thanks to added setup scripts to the environment’s
activate.d
, when the conda environment is activated, it will also
source the eups setup script, activating eups at the same time, which
provides a simplified user experience. The other package in the
ecosystem,stackvana
, relies on stackvana-core
for the build
environment, it proceeds to build the lsst_distrib
tag, and also
relies on activate.d
to setup thatlsst_distrib
tag on
environment activation.
Stackvana uses semantic pinning derived from exact pinning in
scipipe_conda_env
in its build recipe to install dependencies. This
is a slight departure from how scipipe_conda_env
has worked to date,
which uses exact pinning. Stackvana version numbers track lsst_distrib
version numbers.
Currently, it’s remapping many of the dependencies. As those dependencies move to conda-forge and support is added to sconsUtils for setting those up in the build environment, those remaps should mostly go away.
Some splitting up some of lsst_distrib into individual stackvana
components (e.g. stackvana-afw
) may be required to keep compliation
times down for the conda-forge CI system, although we can finish the
build within the current limits.
2.2 Developers start with a toolset environment¶
Installing a daily build:
# conda installed and conda-forge already configured, conda activated
# lsst_toolset_env is metapackage - eups, lsst_build, repos, scons
(base)$ conda create -n lsst-toolset lsst_toolset_env
(base)$ conda activate lsst-toolset
# If the previous command is the first activation, activate.d scripts
# could run `lsst-build config init`, which can setup repos/versiondb
(lsst-toolset)$ eups distrib install -t d_latest lsst_distrib
# A new conda environment was created as part of installing
# scipipe_conda_env.
(lsst-toolset)$ setup lsst_distrib
(scipipe-conda-env-1234abcde)$
Development:
(base)$ conda create -n lsst-toolset lsst_toolset_env
# lsst_toolset_env's activate.d scripts executed
(base)$ conda activate lsst-toolset
(lsst-toolset)$ cd ~/workspace/lsstsw
(lsst-toolset)$ rebuild afw
# internally - rebuild uses lsst-build to `prepare` and `build`
# scipipe_conda_env is prepped, config, installed, declared first
# Following that is the lsst-build build script for base, like so:
# > (lsst-toolset)$ eupspkg PRODUCT=base ... prep
# > # Note: scipipe_conda_env is setup via _build.tags
# > (lsst-toolset)$ setup --vro=_build.tags -r .
# > (scipipe-conda-env-1234abcde)$ eupspkg PRODUCT=base ... config
# ... the rest of lsst-build's build script for base is executed
This workflow is optimized around both eups and lsst-build workflows.
This includes tasks such as the installation of weekly or daily builds
via eups distrib install
, and local development via lsstsw and
rebuild
.
Developers start with a toolset environment that packages eups,
lsst-build, git, git-lfs, compilers, and scripts to configure data for
lsst/repos and lsst/versiondb. The installed toolset environment also
contains the eups database, with the installed eups managing conda
environments with the help of environment
stacking. The conda environment is an
eups package based on Nate Lust’s scipipe_conda
package and
scipipe_conda_env
repo. Nate’s code originally went one step further
in relying on another miniconda
eups package, so conda itself was
installed with eups. It is felt that is not ideal as we still
need a suitable python environment for eups itself, and for path length
reasons explained in detail in [2]. One thing to note that running
setup -r .
on the directory will not activate a complete environment
- the environment is always installed as part of the install phase of
eupspkg.
Nate’s code had relied on manipulating environment variables in the
table file as conda does, by setting CONDA_PREFIX
, appending paths,
etc… However, when conda is activated, conda
is a shell
function intended to be a wrapper over the actual conda CLI, modifying
the environment as necessary when switching between conda environments
or installing dependencies, as setup
does as well, so this breaks or
may have some funny side effects (bad $PS1
) when not directly using
conda activate
or conda deactivate
. Notably, if conda is not
activated but the paths have been modified so that you can find the
conda binary - running a successive conda activate
will fail,
notifying you with an error:
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To address this, we can add new actions for an eups table file:
condaActivate(...)
condaActivateStacked(...)
These actions will emit conda activate {env_name}
and
conda deactivate
statements which get evaluated by running
setup
. The assumption here is that conda has been activated, which
it should have been if you can use eups. The second action,
condaActivateStacked
is the one we actually need, but it is
proposed to add both to eups, as the semantics are slightly
different.
The lsst_toolset_env package will be derived from the stackvana-core build recipe, which packages up eups.
With scipipe_conda_env as an eups package, stackvana will possibly need an eups remap file for that package.
Relatedly, in both the stackvana case and the developer case, there is
one important caveat to note by having base
depend on
scipipe_conda_env
. If a product, such as qserv
or sims, wishes
to define its environment, but relies a dependency which relies on
base
, they will likely need to either remap scipipe_conda_env
with their environment, or produce their version of
scipipe_conda_env
and rely on version restrictions in their table
file on scipipe_conda_env
. There may be some other solution that
hasn’t been identified. This isn’t any different than the current
situation with scipipe_conda_env
.
2.2.1 Conda Environment Stacking¶
Conda has an environment stacking
feature.
We will use this to overlay the scipipe_conda_env
environments over
the toolset environment, which allows us keep eups, lsst-build, and even
compilers after activating those environments.
Stacking can be used to improve lsst-build when executing a build in a
different conda environment, which is a CI and rebuild
use case,
similar to how conda-build works I believe. Here is an example script
executing commands in different stacked environments:
#!/bin/bash
# re-init conda functions (conda/conda#7753)
CONDA_EXE_ROOT=$(dirname $(dirname $CONDA_EXE))
source $CONDA_EXE_ROOT/etc/profile.d/conda.sh
# The following will pick up the system git
echo "git in scipipe-env environment (conda run isolated environment)..."
conda run -n scipipe-env git --version
# The following will pick up git from the lsst-toolset environment
echo "git from scipipe-env environment (stacked environments)..."
conda activate --stack scipipe-env
git --version
With that stacking.sh
example, we can execute it:
$ git --version
git version 2.20.1 (Apple Git-117)
$ source /opt/conda/bin/activate lsst-toolset
(lsst-toolset)$ ./stacking.sh
Stacking example
git in scipipe-env environment (conda run isolated environment)...
git version 2.20.1 (Apple Git-117)
git from scipipe-env environment (stacked environments)...
git version 2.23.0
2.2.2 eups, lsst-build, lsstsw, newinstall, repos, versiondb, sconsUtils, loadLSST.bash¶
By starting with conda environment and leveraging activate.d
scripts, I think it’s possible to encapsulate the execution of
lsstsw/bin/deploy, newinstall.sh, and loadLSST.bash so that parts of
those scripts are run at environment activation time. The interface is
conda activate
.
Going a step further, moving eups-related commands in lsstsw into
lsst-build (rebuild
, mass-tag
) is desirable, as well as tools to
manage lsst/repos and lsst/versiondb, along with config files for
lsst-build itself. By delineating configuration commands for lsst-build,
with initialization and update actions to manage build-related data like
lsst/repos and lsst/versiondb, we reduce the complexity down to one
tool, lsst-build.
# write new config file. Stored at $CONDA_PREFIX/etc/lsst-build/
(lsst-toolset)$ lsst-build config init
# sync config - download lsst/repos/etc/repos.yaml, lsst/versiondb
(lsst-toolset)$ lsst-build config sync
# Switch repos. Equivalent to:
# cd $CONDA_PREFIX/share/lsst/repos; git checkout tickets/DM-98765
(lsst-toolset)$ lsst-build config write repos-ref tickets/DM-98765
(lsst-toolset)$ lsst-build config sync
Should this not be desirable for some reason, the user is always free to setup a new toolset environment.
2.2.3 Other considerations¶
From an lsstsw/lsst-build standpoint, the stacked environment is a bit kinder with CI than the stackvana approach, due to replication of large product repos (e.g. git-lfs+afwdata).
2.2.4 Conclusion¶
In conclusion, in this proposal we feel that Stackvana can drastically simplify user installation in the case of consuming official releases. The toolset approach can provide developers the flexibility they need with eups and play nicely with CI. We should support both with new conda environments and some modest changes to lsst-build and eups. In both cases, switching to conda-forge for third-party dependencies and compilers should improve compatibility for both newer and older operating systems, while keeping LSST software up-to-date with external software.
2.2.4.1 Footnotes¶
[1] Beyond conda-forge
There’s a few different avenues which could be investigated to improve the experience of compiling, CI, and integration with conda-forge further.
- If the 2-4 releases per year of the stack is not adequate for the
user base consuming stackvana, investigate more frequent releases.
- Alternatively, produce conda-forge compatible monthly or weekly stackvana builds to an lsst conda channel
- Reduce proliferation repos to several core product repos and apply
semantic versioning on it
- Aligns closer to product tree
- Map product repos to conda-forge packages with semantic versions
- Makes Stackvana work better with limited conda-forge CI resources
- Investigate replacing lsst-build+scons+sconsUtils+eups with bazel
- Optimized for monorepo projects, but has support for external git repos
- Support for multiple langauges out of the box (see Tensorflow).
- Several projects to support remote build caching and build clusters. Shared build cache works nicely with ephemeral workers/cloud
- Shared build cache could be backed by nginx WebDAV, which we deploy on the LSP
- No built-in pytest support
- Investigate replacing lsst-build+scons+sconsUtils+eups with CMake +
ExternalProject
- More standardized than Bazel
- No native remote build cache. We could keep the shared filesystem way, but that might not help users outside of Jenkins/CI
- No built-in pytest support
[2] Issues with installing miniconda with eups
One issue around using conda as an eups dependency is linux path length.
With eups, the executables can be buried down a bit. To illustrate this,
assume eups is installed under a pipelines
directory on CVMFS, and a
user is testing a new version of miniconda from a tickets branch. We
assume that the activated conda environment was named
scipipe_conda_env
, which may be smaller than the normal path length.
We would likely end up with a path close of 129 characters:
#!/cvmfs/sw.lsst.eu/pipelines/eups/Linux64/miniconda/tickets.DM-29999-g123456789+0123456789/envs/scipipe_conda_env/bin/python3.7
Because of this, it’s my recommendation to not package miniconda with eups, as the path length and rely on the user to setup conda before activating eups. I would recommend we tell users which conda version is preferred and forward them to instructions on where to acquire conda.