Archiva in a Team: Part 2

Brett Porter

September 2009

Deleting artifacts in your repository

Sometimes the need for deleting artifacts from the repository arises. For example, if an artifact was deployed by accident to the repository or the artifact has already been released but an old snapshot version is still available. In Archiva, there are different ways of deleting artifacts from the repository—through WebDAV, via the web application, through the scheduled repository purging, or by directly deleting it in the file system.

It is not recommended that artifacts be deleted directly from the file system. Not only does it require access to the server itself, it is also prone to error. Artifacts that should not be deleted could be deleted by mistake. In case you still want to directly delete an artifact from the file system, all files related to the artifact such as metadata files and checksums must also be deleted. The repository must be scanned as well in order to update the metadata files. This can be done by clicking the Scan Repository Now button of the repository configuration in the Repositories page. The database scanning also needs to be explicitly executed to immediately remove the deleted artifact from the database.

One of the advantages of using the Delete Artifact form in the web application is that you do not need to have direct access to the server. All you need is the required Archiva permissions, which come with the Repository Manager role (without the permissions Delete Artifact will not be visible in the navigation menu). Another advantage is that the repository scanning no longer needs to be explicitly executed as Archiva already executes the repository and database scanning consumers to update the index and the database for you.

Now, let's try deleting an old artifact from one of the repositories. If you go to http://localhost:8081/archiva/repository/snapshots/com/effectivemaven/centrepoint/centrepoint, the old 1.0-SNAPSHOT version of the project still exists. We will remove this artifact from the repository using the delete artifact web form.

First, click Delete Artifact from the navigation menu and then fill in the form as follows:

Archiva in a Team: Part 2

Click the Submit button. After the artifact has been deleted, you should see the confirmation message Artifact 'com.effectivemaven.centrepoint:centrepoint:1.0-SNAPSHOT' was successfully deleted from repository 'snapshots'. If you browse the repository at http://localhost:8081/archiva/repository/snapshots, the related artifacts such as the POM, maven-metadata.xml, and the checksums were also deleted.

To delete artifacts through WebDAV, just open the repository using a WebDAV client and delete the artifact like in a regular file system. As for the scheduled repository purging, we will discuss this in the following sections.

We have tackled the subjects of repository groups, RSS feeds, and deleting artifacts in the repository. This article would never be complete without covering repository maintenance. The succeeding sections will be all about that.

The Archiva reports

Archiva generates two types of reports. These are the repository statistics, providing information such as statistical data of a repository's content and the repository health report, which makes us aware of any problems in the repository such as artifacts that have invalid POM files. Both accept different criteria for customizing the generated output as seen in the following screenshot:

Archiva in a Team: Part 2

Now, let's discuss the configuration for each report.

Repository statistics

This report provides statistical repository information such as the total number of artifacts in the repository, its total size, the number of plugins in the repository, and the likes based on a given repository scan execution time. This report can be used for analyzing the current content of your repositories, and tracking its growth, usage, and evolution over time. The report can be constrained by the given Start Date and End Date. If no Start Date and End Date are provided, all statistics right from the start up to the current date will be included in the report (to a maximum of the number of rows given in the Row Count).

For the Repository Statistics, we can also configure the Repositories To Be Compared. If only one repository is selected in Repositories To Be Compared, the generated report will contain details of a single repository. The following is a sample report where only one repository is selected:

Archiva in a Team: Part 2

Let's run through the contents of the sample Repository Statistics report given previously for repository internal. The Total File Count pertains to the total number of files in the repository during each execution of the repository scan. The Total Size, on the other hand, is the size (in bytes) of the repository at that time. The number of unique groups and artifact names are broken down in the report as well as the number of plugins, archetypes, JAR, and WAR files.

The last two columns—number of deployments and artifact requests—are not yet implemented but will be fixed in the future releases.

On the other hand, if more than one repository is selected in the Repositories To Be Compared, the generated report would contain a comparison of the latest statistics of the repositories based on the specified End Date. This is useful for tracking which repositories are the most utilized. For example, if different development groups host their own repositories, the comparison can show which groups are using the most space. Look at the following screenshot for a sample comparison report to see the difference from the previous one:

Archiva in a Team: Part 2

To allow you to view this report outside of the web application, the report can be exported as a CSV file by clicking on the Export to CSV link. You should be able to open the exported file as an Excel spreadsheet.

Repository health

One of the secrets behind a successful and reproducible build is a clean and healthy repository. Corrupt metadata or an invalid or missing POM file are the usual causes for a build to break. To prevent this from happening, we must ensure that the repositories we are getting our artifacts from are in good health.

Archiva provides a way of doing this through the Repository Health report and its built-in utilities for updating metadata and fixing checksums. The Repository Health report provides a detailed list of artifacts in the repository that are found to be defective. It gives a starting point for correcting any problems and can be used when diagnosing build errors with a particular artifact.

For example, a common reason for an artifact being defective is when the version of the artifact specified in the POM is different from the actual version in its filename. This could easily happen when using deploy:deploy-file (or even using the Archiva web upload form) as the actual filename used for the uploaded artifact is determined based on the supplied parameters. It is a possibility that the included POM in the upload has different coordinates from the provided parameters. These defects are discovered during Archiva's database scan, when the actual POM file is read and added to the database. We can narrow down the report by providing a specific Group ID and/or a Repository ID which will be used for querying defective artifacts that match these criteria. If you try querying for the report using the default configuration, you should be able to see a generated report similar to the following one, which shows a defective POM in repository internal.

Archiva in a Team: Part 2

To repair such an error, you can manually fix the POM in the Archiva repository by updating it in the file system. If the defect is caused by a transfer error when the artifact was proxied, you can delete the artifact (including the metadata and checksums) then force Archiva to retrieve it again by requesting it.

A word of caution though—making these changes could affect the reproducibility of a dependent project's build. For example, it is possible that the actual artifact in the central repository is the defective one. If you fixed the artifact in your internal Archiva repository, project builds that go through the local proxy may get a successful build. However, the project is built directly off central and the build fails because the dependency artifact is defective.

That summarizes monitoring the health of our repositories. The next section discusses the built-in Archiva utilities which in one way or another clean up and repair broken artifacts and metadata in the repositories.



The Archiva consumers

In this section, we'll go into the details about what they really are.

What is a consumer?

A consumer is a plugin-like component in Archiva. When repositories are scanned or changes to the repository are made, they are able to process the change to keep the rest of the system up to date and perform any other necessary operations. Some examples of these operations are—indexing, repository clean up, adding of artifact information to the database, and fixing metadata.

There are basically two types of consumers in Archiva. These are the repository consumers and the database consumers. The former are attached to the repository scanning and are executed on each artifact processed from the file system. On the other hand, the latter are executed on each artifact in the database when the database scanning is triggered. The list of available repository consumers is displayed at the bottom section of the Repository Scanning page, while the database consumers are listed in the Database Scanning page. Administrators can enable and disable the consumers to be executed in these respective pages.

Archiva's maintenance-savvy consumers

The consumers that we'll be tackling in this section are those that do maintenance work. We will only cover the four major consumers concerned with this kind of task: the repository-purge, metadata-updater, create-missing-checksums, and database-cleanup consumers.

Purging outdated snapshots

The main purpose of the repository-purge consumer is cleansing the repositories of old snapshot artifacts. It can be configured to clean up the repository based on two criteria: by the age of a snapshot artifact and by retention count. You will have noticed these configuration options on the repository configuration pages—enabling this consumer will put them into action. If the former criteria are used, the last modified date of the artifact is checked to see if it is older by that number of days from the current date, and if so, then the artifact will be deleted. Otherwise, if the latter criteria are used, we can be rest assured that there will always be the given number of artifacts retained for each snapshot version of an artifact and any older artifacts will be deleted.

Archiva in a Team: Part 2

You can also use the criteria together. If we set the Repository Purge By Days Older Than to 100 and the Repository Purge By Retention Count to 2, then there will always be two artifacts for each snapshot version retained in the repository. The rest of the artifacts for that snapshot version will be deleted, if and only if their last modified date is older than 100 days from the current date.

If we want to specifically use the by retention count criteria, we need to set the Repository Purge By Days Older Than to zero. For example, if we set the Repository Purge By Retention Count to one and the Repository Purge By Days Older Than to zero, then for every snapshot version there will always be one artifact (the most recently deployed) retained and the rest of the artifacts for that snapshot version will be deleted regardless of its age.

Let's try out purging by retention count. We will be using our Centrepoint project again for this demonstration. First, we need to deploy a few snapshot builds of the projects which we will be purging later on.

From the sample code we were working with earlier, deploy the project to the snapshots repository at least twice:

centrepoint$ mvn deploy

If we look at the Centrepoint distribution in http://localhost:8081/archiva/repository/snapshots/com/effectivemaven/centrepoint/distribution/1.1-SNAPSHOT/, we should see something similar to what is shown in the following:

Archiva in a Team: Part 2

The repository purging is not enabled by default in Archiva. To turn it on, go to the Repository Scanning tab and select it from the checklist at the bottom of the screen, then press Update Consumers. Due to a bug in Archiva 1.2.1, you may need to restart the server for this to take effect.

The next step is to configure the repository purge by retention count. Edit the snapshots repository configuration and set Repository Purge By Days Older Than to zero (0) to disable that criteria. Retain the Repository Purge By Retention Count to two (2). Save the changes then execute the repository scanning by clicking the Scan Repository Now button. After the repository scan finishes its execution, refresh http://localhost:8081/archiva/repository/snapshots/com/effectivemaven/centrepoint/distribution/1.1-SNAPSHOT/. This time, you should only see the last two builds (the artifacts with -2 and -3 as build numbers) as we told the repository-purge consumer we only want to retain two builds of each snapshot version of an artifact for the snapshots repository.

Now, going back to the screenshot of the snapshots repository configuration earlier, notice the Delete Released Snapshots checkbox near the bottom. This field is also another configuration for the repository-purge consumer. When it is enabled, if a released version for a given snapshot is found (in any Archiva managed repository), all of the snapshots will be removed. We will again use the Centrepoint artifacts as examples here. In the earlier section—Deleting artifacts in your repository—we only removed com.effectivemaven.centrepoint:centrepoint:pom:1.0-SNAPSHOT. Browsing to http://localhost:8081/archiva/repository/snapshots/com/effectivemaven/centrepoint/ shows us that the other Centrepoint 1.0-SNAPSHOT artifacts are still in the snapshots repository even though 1.0 has already been released. We will use the Delete Released Snapshots feature to automatically remove these old snapshot artifacts.

Now, edit the snapshots repository configuration and tick the Delete Released Snapshots checkbox, then save the changes. As done earlier, execute the repository scanner. After a while, if we check the snapshots repository again we will notice that the old 1.0-SNAPSHOT Centrepoint artifacts are now gone.

Correcting Maven metadata

The metadata-updater consumer is used specifically for updating and fixing the maven-metadata.xml files in the repository. This is very useful in cases when artifacts are deleted from the file system and the metadata files weren't updated. When this consumer is executed during the repository scan, artifact metadata files are updated and fixed based on the actual versions of an artifact that is physically present in the file system.

Correcting metadata only affects reproducibility if ranges or automatic plugin versioning are used. In those cases, reproducibility is likely to be affected over time anyway. For this reason, it is usually a good idea to enable this consumer on your repositories.

Creating missing checksums

The create-missing-checksums consumer is simply for generating checksum files, both MD5 and SHA-1, for artifacts with missing checksum files. It also fixes incorrect checksums that it finds during repository scanning.

This can be useful if you have a number of problems with incorrect checksums, or if you want them to be auto-corrected on deployment. However, caution is needed. If it is possible that the checksum was actually correct and the artifact has been modified, this will remove the ability to detect it (which is a more serious problem!).

Database cleanup consumers

There are three consumers that perform internal Archiva clean up: not-present-remove-db-artifact, not-present-remove-db-project and not-present-remove-indexed. These three are used for cleaning up the database of artifact and POM information, and cleaning up the index of artifacts that have been removed or no longer present in the file system, respectively. After these consumers are executed, deleted artifacts should no longer appear in the web application Browse and Search.

If you find that you receive stale information in the web interface, ensure these consumers are enabled and re-scan the repositories.


In this article, we have learned about the different roles and permissions in Archiva. We've also learned about repository groups and a couple of techniques on how to configure them.

Other features provided by Archiva such as RSS feeds and artifact deletion were covered, while the last parts of the article were dedicated to maintenance.

[ 1 | 2 ]


If you have read this article you may be interested to view :

You've been reading an excerpt of:

Apache Maven 2 Effective Implementation

Explore Title