yoavl

2 posts

Devops and RPM Distribution

Regularly deploying and provisioning RPMs is increasingly becoming a common need. With devops automating every aspect of deployment, including setting up new hosts in virtualized environments, packaging systems such as RPM are getting more and more important. Managed RPMs provide a great level of control over installations and easy upgradability.
Amazon’s own AWS images are one good example where RPMs deployed to a controlled YUM repository can provide a great experience for users. Yet, automating the creation of a good YUM repository for sharing and controlling RPMs is a bit tricky.
 

A Look at Java Libraries Provisioning

Provisioning and resolving Java artifacts is a long-time common practice. All build tools (Maven, Gradle, SBT, Ivy, etc.) can resolve dependencies from binary artifact repositories and deploy back the artifacts they have built. These artifacts are later consumed by other builds and packaging tools. With CI servers, this process is repeated concurrently and more often, automating build pipelines where artifacts are exchanged between different build stages through the repository.
 
So, while we have achieved a standard nice automation for creating and provisioning module artifacts (such as, Maven artifacts), this is not the case for system package modules, such as the ones created by RPM or Dpkg.
 

Is RPM Provisioning Really Any Different?

One can argue that system packages differ a lot form language-level modules. One big difference is the former are focused on providing a *single*, coherent view for all installed modules. This is why conflicting module versions is problematic, why upgrade/uninstall is a must have and also why these systems employ complex cross-module semantics. Things are much easier for Java applications in this sense, where every application sees an isolated view of modules.
However, in terms of sharing and provisioning artifacts through a shared repository, the basic principals are essentially the same!
  • Module artifacts are packaged and deployed into a remote repository.
  • Deployment generates repository-level metadata/indexes that make it simpler and safer for clients to discover modules and consume them. Some examples of such metadata are: checksum files, maven-metadata providing information about the artifact versions, repository data describing the RPMs in a YUM repository, etc.
  • Clients contact the repository, often via HTTP, to download artifacts.
 

Making RPM Provisioning Easy

Deploying Java artifacts to a repository is a breeze and is natively supported by build tools and CI servers. Unfortunately, the same cannot be said about RPM deployment -
After building each RPM, the client needs to create the YUM repository metadata for the whole repository. This is done manually on the server file-system, and is not a by-product of simply deploying the RPM artifact. But there is really no reason for separating deployment from updating the YUM repository!
To automate the process, one would have to trigger execution of the “createrepo” command on the RPM repository after each deployment. It can be done, but requires setting up some home-grown solution.
Another approach is to introduce this functionality directly into the artifacts repository!
This is exactly the approach taken by JFrog’s Artifactory Pro. Upon deployment, Artifactory generates the YUM metadata on the fly and make it available for provisioning almost instantly.
 
There are a couple of huge advantages to this approach:
  1. RPMs get all the advantages of being hosted in a binary repository: they become searchable and you can apply security and visibility rules to RPMs
  2. No setting-up a separate process is needed - you are already running a repository manager, so why not let the repository take care of YUM RPM hosting automatically?
  3. The whole process is pure-Java and can be run on any OS - not just ones that support the ‘createrepo’ binary
  4. Detailed RPM information can be viewed from Artiafctory’s UI
  5. RPM calculation can be triggered automatically in the background, or via REST
  6. You can auto-generate metadata in sub-paths. This is a common requirement for repositories grouping artifacts by version and architecture into individual YUM sub-repositories.
This makes the experience of provisioning RPMs via a YUM repository as easy as sharing Java artifacts in a cross-platform way, fully automated from build tools!
 
 
To learn more about the integrated YUM repository support in Artifactory, please visit this link.
 
To learn even more about RPM provisioning as part of devops automation, and about other practical aspects of devops in the cloud, register yourself to the “Devops in the Cloud” event, hosted by JFrog at Neflix, on Thursday, December 15, 2011, presenting the cloud experience of Netflix, CloudBees and JFrog. See you there!

 

 

Tracking Artifact Licenses - Why is this Hard?

Tracking licenses of third-party artifacts is not one of those tasks that get developers excited. With more interesting problems to solve than legal issues, it is not usually high on the priority list for most teams to deal with licenses during active development, so more often than not, this is left as one of the final steps before preparing a release.
Even when you do try to take due diligence and track those third party licenses, making sure that all developers verify each dependency and its transitive dependencies for compatibility with your company’s license usage policy is not a trivial thing to do. Eventually this results in manually digging through each and every dependency in the project and attempting to accurately keep track of the license that each dependency uses.
Now, if you are only developing in-house projects, then this may not seem like a big deal, but once you begin distributing your software, even as a cloud service, the risk of using a third party dependency that uses an unwanted license is a reality.
 

License Information is Out There - Module Info to the Rescue!

Getting the initial license information for third party dependencies doesn’t have to be a manual process - with modular dependencies there is already good information out there that we can leverage!
Maven, Ivy (+Ant), and Gradle (which uses Ivy) all describe artifacts and dependencies in terms of reusable declarative modules. Both Maven POM files and Ivy descriptor files are designed to contain license information as part of the module metadata. And, in fact, many open source libraries already include valuable license information in their descriptors. Potentially, that means that extracting license information from module metadata can be fully automated!
Almost...
 

Relying on Module Metadata - Not Quite There Yet...

In practice, there are a couple of issues with purely relying on license information from Java modules:
  1. License naming zoo - Current Java module systems (POMs/Ivy files) define license information as free-form text with no specific standard - unlike, for example, Python PyPi modules that use a closed list of OSI licenses.
    For example, the Apache 2 license, may sometimes appear by its full name ‘The Apache Software License, Version 2.0’, as ‘Apache License, Version 2.0’, or as ‘Apache License V2.0’, etc.
    This makes identifying the correct license hard and requires the use of heuristics.
  2. Missing license data - that requires the ability to manually determine a license for an artifact, rather than relying on auto-discovery.
  3. Wrong license data - like missing license data, this requires the ability to manually and permanently override an auto-discovered license for an artifact.
  4. License data may be implicit - for example, the license may reside in a parent POM or in a mixin descriptor. This requires traversal of the module inheritance chain to discover the real license.
  5. Multiple licenses - it is not uncommon for artifacts to have more than a single license (e.g. CDDL v1.0 and GPL v2). In this case, you need to decide which license is the applicable active one.
 

Managing Licenses with an Artifact Repository

Many organization already manage their published artifacts and dependencies in a central Artifact Repository, such as JFrog's Artifactory. The repository keeps all the organization’s binaries which are used by the developers and by the build system.
Apart from managing the binary data itself, Artifactory also manages metadata about artifacts.
 
Managing license information about artifacts as part of this metadata just seems the natural thing to do:
By using the artifacts repository we can tag our artifacts with license information managed at a central place. Adding this license metadata information can be fully automated and can also be controlled by users!
This is, in fact, exactly approach taken used by the License Controlfeature in Artifactory Pro, and it solves all previously mentioned issues related to license information extraction.
 
This is how it works:
  • Artifactory maintains a list of all well-known licenses. Users can extend this list with custom licenses. Each license can be approved or unapproved.
  • Each license contains a regular expression used to match against free-form license information inside module files in the repository (the many licenses bundled with Artifactory have been tested and fine-tuned against numerous open source projects).
  • Artifacts can be tagged with license information - manually or automatically. Once tagged, this information is reusable. Automatically discovered license information can be overridden by users. Multiple licenses per artifact are supported.
  • License discovery and reporting merges automatic license extraction with user-defined license data to initiate license violation alerts.
 
 http://www.jfrog.org/files/java.net/licenses.png
 
 

Discovering Licenses Automatically - Build Servers Never Lie

Automatic license discovery and notifications about possible violations is done as an integral part of the Continuous Integration process -
Whenever a new dependency is introduced by a developer it will get picked up on the next build by triggering automated license analysis. If the dependency is using an unknown or unapproved license an email notification will be sent to specified users.
 
This is all possible using Artifactory’s comprehensive build integration with Jenkins (formerly, Hudson),JetBrains TeamCity and Atlassian Bamboo and works for Maven 2 & 3, Ivy and Gradlebuilds on each build server.
 
When installing the Jenkins Artifactory plugin, for example, you will get the options to run license checks as part of the build (identical functionality exists for TeamCity and Bamboo):
 
 
 http://www.jfrog.org/files/java.net/hudson.png
 
 

The Full Cycle - From Modules to Automated License Checks

Here is how it all works together to automatically extract and apply licensing information and conduct license violation checks on the fly:
 
http://www.jfrog.org/files/java.net/flow.png 
 
 
A developer declares new dependencies in pom.xml files or ivy.xml descriptors (1). Once the changes are declared the developer commits them to the Version Control System (2).
The CI build server monitors version controlled files, sees the changes and pulls them to its workspace (3), which triggers a build (4).
The build is run and intercepted by the Artifactory plugin (for the relevant CI server). The data intercepted is a complete BuildInfo for the build (acting as a bill of materials), including information about all resolved dependency artifacts (5).
 
Note: It is important to realize that the context of a build is the only reliable source of information for the actual dependencies used by your project, since dependency resolution can be dynamic and rely on dynamic aspects like version ranges, the state of the repository at the time of build, resolved properties, etc.
 
The Artifactory plugin publishes all modules with the captured BuildInfo to Artifactory (6). This is where things start to get interesting -
Artifactory looks at the dependencies and for each artifact attempts to figure out what licenses it uses (7). This is done by combining: license information from module metadata, previously found license information and user-set license information. It is even possible to tell Artifactory the exact build scopes/configuration for which dependencies need to be checked.
 
At the end of the analysis an email with all license violations discovered is sent out to the configured recipient addresses (8). Normally this would be the development lead or the project lead and not someone from legal.
Although there may be license violations, the build will not fail - This approach allows development to move on naturally, while letting development leads discover possible licensing discrepancies immediately as they surface and deal with them before they become an issue. To submit the information beyond the development circle, you can generate license usage reports (9) to incorporate into the legal department’s favorite Excel template. Effectively what this means is that you never have a single artifact in your project that was not verified for license information prior to submitting it!
 

Wrap-Up

Using the power of a central repository manager like Artifactory, we can extract important license information and combine it with user definitions in order to automate the process of license governance. This is done in the context of a project build executed automatically by the CI server upon changes in the version control system. This ensures that all possible license violations are handled immediately when new or modified dependency declarations are checked in.
The approach taken towards license control is developer-oriented - never stop the build, but let development leads decide per new dependency whether it can go into the project, before the information is generated and transferred for legal improvement.
You can read more about the Artifactory  License Control feature on the JFrog wiki, or watch this short video to see the full cycle described here is action.