From 1d1c9eda99a6133cd3a09cc4c234361d51c75986 Mon Sep 17 00:00:00 2001 From: "Uwe Jandt (DESY)" <uwe.jandt@desy.de> Date: Tue, 1 Dec 2020 16:32:37 +0100 Subject: [PATCH] added software news posts 2020 part 2 --- .../02/2020-02-21-HIFIS-workshops-2020.md | 53 +++ .../2020-04-15-smime-signing-git-commits.md | 248 +++++++++++++ .../2020/04/2020-04-17-online-swc-at-hzdr.md | 312 ++++++++++++++++ _posts/2020/04/2020-04-20-ML-hackathon.md | 56 +++ _posts/2020/06/2020-06-01-helpdesk-launch.md | 51 +++ _posts/2020/06/2020-06-11-HackyHour.md | 62 ++++ .../08/2020-08-19-introducing-consulting.md | 68 ++++ ...020-09-23-getting-started-with-docker-1.md | 293 +++++++++++++++ ...020-09-25-getting-started-with-docker-2.md | 322 ++++++++++++++++ ...020-09-29-getting-started-with-docker-3.md | 349 ++++++++++++++++++ .../2020/10/2020-10-15-survey-technology.md | 205 ++++++++++ 11 files changed, 2019 insertions(+) create mode 100644 _posts/2020/02/2020-02-21-HIFIS-workshops-2020.md create mode 100644 _posts/2020/04/2020-04-15-smime-signing-git-commits.md create mode 100644 _posts/2020/04/2020-04-17-online-swc-at-hzdr.md create mode 100644 _posts/2020/04/2020-04-20-ML-hackathon.md create mode 100644 _posts/2020/06/2020-06-01-helpdesk-launch.md create mode 100644 _posts/2020/06/2020-06-11-HackyHour.md create mode 100644 _posts/2020/08/2020-08-19-introducing-consulting.md create mode 100644 _posts/2020/09/2020-09-23-getting-started-with-docker-1.md create mode 100644 _posts/2020/09/2020-09-25-getting-started-with-docker-2.md create mode 100644 _posts/2020/09/2020-09-29-getting-started-with-docker-3.md create mode 100644 _posts/2020/10/2020-10-15-survey-technology.md diff --git a/_posts/2020/02/2020-02-21-HIFIS-workshops-2020.md b/_posts/2020/02/2020-02-21-HIFIS-workshops-2020.md new file mode 100644 index 000000000..c1df174e4 --- /dev/null +++ b/_posts/2020/02/2020-02-21-HIFIS-workshops-2020.md @@ -0,0 +1,53 @@ +--- +title: Overview of HIFIS Workshops in 2020 +date: 2020-02-21 +authors: + - leinweber +layout: blogpost +title_image: default +categories: + - announcement +tags: + - workshop +excerpt: + " + HIFIS workshop cluster in spring and autumn 2020: Basic Software Carpentry, + Introduction to GitLab and Bring Your Own Script. + Specialized workshops are being planned: Catch them all + by keeping an eye on our events page! + " +--- + +As announced on [our events page][events], we offer several different +software development trainings to researchers. + +1. "Software Carpentry" workshops introduce scientists to use- and powerful tools: + Shell, Python and R for effective and reproducible scientific programming, + and Git for version controlling your projects. + Our first such event is planned for [March 31st and April 1st][200331]. + +2. "Bring Your Own Script" workshops ([like on May 12th and 13th][200512]) + will help you to make your code publication-ready: + Regardless of your preferred programming language, we will advise you on all aspects + of making your research software (re)usable for others, e.g. how to add a license. + +3. "GitLab for Software Development in Teams" introduces advanced GitLab features: + GitLab's project management, collaboration and automation features will help software development teams + to improve software quality and speed up their release cycles. + Join us on [June 9th and 10th][200609]! + +Each event page provides more details on the topics to be covered, +possible prerequisites and registration instructions. +Please find all those details on the [individual event pages][events]. + +More specialized events outside this core curriculum +— like an [Introduction to Snakemake][snakemake] — +are being planned, so best keep an eye on that page! + + +[events]: {{ 'events' | relative_url }} + +[200331]: {% link _events/2020-03-31-the-carpentries-workshop.md %} +[200512]: {% link _events/2020-05-12-ready-script-for-publication.md %} +[200609]: {% link _events/2020-06-09-GitLab-Software-Development-Teams.md %} +[snakemake]: {% link _events/2019-12-11-snakemake-introduction-workshop.md %} diff --git a/_posts/2020/04/2020-04-15-smime-signing-git-commits.md b/_posts/2020/04/2020-04-15-smime-signing-git-commits.md new file mode 100644 index 000000000..d7246d3c4 --- /dev/null +++ b/_posts/2020/04/2020-04-15-smime-signing-git-commits.md @@ -0,0 +1,248 @@ +--- +title: S/MIME Signing Git Commits +date: 2020-04-15 +authors: + - huste +layout: blogpost +title_image: sign-pen-business-document-48148.jpg +excerpt: + Git is cryptographically secure, but it is not foolproof. To verify that + work taken from the internet is from a trusted source, Git provides a way to + sign and verify work using X509 certificates. This guide will show you how + to setup signing of Git commits for the operating system of your choice. +categories: + - tutorial +tags: + - Git + - Version Control + - Best Practice +redirect_from: + - tutorial/2020/04/15/smime-signing-of-git-comits +--- + +{{ page.excerpt }} + +## Is Signing Git Commits Worth it? +Every commit in a Git repository has an author, but this information is not +verified by Git. +This is how to configure your name and email address when start +working with Git: + +```console +$ git config --global user.name "John Doe" +$ git config --global user.email "john.doe@hifis.net" +``` + +It is easy to create commits that appear to be authored by someone else. +The principle can be compared to the falsification of email senders. +In an unsigned email you cannot be totally sure, that it was sent by the +person specified in the email header. +Luckily, thanks to the [DFN PKI][dfn-pki] infrastructure most Helmholtz +centers already offer their employees the option to request a personal +certificate. +Starting with Git version `2.19` the signing and verification support was +extended to include support for S/MIME using X.509 certificates. +The mechanism that might already be known to you from emails can now be used +for Git commits as well. +Signing Git commits is another valuable use case for these personal +certificates. +If you do not have one yet, talk to the IT department of your institution to +get information about the application process in your research center. + +## How to Configure S/MIME Signing? +Before being able to use S/MIME for Git commits or tags in your own work, some +configuration is necessary. +Luckily, the configuration only needs to be done once per device and user +account. +Parts of the setup procedure depend on the operating system of your +choice. +Please choose the right section that applies to you. + +Before we continue, please make sure that your Git version is `2.19.0` or +later. + +```console +$ git --version +git version 2.26.0 +``` + +In case the installed version is older than `2.19.0` please follow the +instructions on the [Git website][git-scm]. +The installation of Git is beyond the scope of this tutorial. + +### Linux +On Linux we will use the tool `gpgsm` to enable S/MIME signing of Git commits. + +1. The tool can usually be installed via the package manager of your + distribution. + **Debian based:** + ```console + $ sudo apt-get install gpgsm + ``` + **CentOS/RedHat Linux:** + ```console + $ yum install gnupg2-smime + ``` + **Fedora:** + ```console + $ dnf install gnupg2-smime + ``` + 2. Import your private key and certificate: + ```console + $ gpgsm --import <filename>.pfx/p12 + ``` + 3. Make sure that your key was imported properly: + ```console + $ gpgsm --list-keys + ID: 0x12345678 + Issuer: /CN=DFN-Verein Global Issuing CA/OU=DFN-PKI/O=Verein zur Foerderung eines Deutschen Forschungsnetzes e. V./C=DE + Subject: /CN=Huste, Tobias/O=Helmholtz-Zentrum Dresden - Rossendorf e. V./L=Dresden/ST=Sachsen/C=DE + aka: t.huste@hzdr.de + validity: 2019-10-07 10:47:08 through 2022-10-06 10:47:08 + key type: 2048 bit RSA + key usage: digitalSignature nonRepudiation keyEncipherment + ext key usage: clientAuth (suggested), emailProtection (suggested) + ``` + It might be necessary to also include the DFN chain. Therefore, execute + these commands: + ```console + $ curl https://pki.pca.dfn.de/dfn-ca-global-g2/pub/cacert/chain.txt | gpgsm --import + ``` + <i class="fas fa-exclamation-triangle"></i> **Note:** Above command is + specific for certificates issued by _DFN-Verein Global Issuing CA_. + + 4. Configure Git to use your commit for signing. + ```console + $ export SIGNINGKEY=$( gpgsm --list-secret-keys | egrep '(key usage|ID)' | grep -B 1 digitalSignature | awk '/ID/ {print $2}' ) + $ git config --global user.signingkey $SIGNINGKEY + $ git config --global gpg.format x509 + ``` + +### Windows and MacOS +1. Install [smimesign (MacOS)][smimesign-mac] or + [smimesign (Windows)][smimesign-windows] by following the instructions on + the given page. +2. Configure Git to use smimesign for all repositories: + ```console + $ git config --global gpg.x509.program smimesign + $ git config --global gpg.format x509 + ``` +3. If you have already installed your private key and certificate to your + system, no further configuration is required for `smimesign`. Please + configure Git to use the same email address as supplied in your personal + certificate. + **Find you Git email address:** + ```console + $ git config --get user.email + john.doe@hifis.net + ``` + **List available signing identities:** + ```console + $ smimesign --list-keys + ``` + +## Sign your Git tags +When creating a signed Git tag, all you need to do is to replace the `-a` flag +by `-s`. + +```console +$ git tag -s v1.0 -m 'My first signed tag' +``` + +To verify a signed tag, use `git tag -v <tag-name>`. + +```console +$ git tag -v v1.0 +object ac4d8f716fcdaec5617a49caa850cfafec7e947c +type commit +tag v1.0 +tagger Tobias Huste <t.huste@hzdr.de> 1586416623 +0200 + +My first signed tag +gpgsm: Signature made 2020-04-09 07:17:03 using certificate ID 0xBBD386A3 +gpgsm: Good signature from "/CN=Huste, Tobias/O=Helmholtz-Zentrum Dresden - Rossendorf e. V./L=Dresden/ST=Sachsen/C=DE" +gpgsm: aka "t.huste@hzdr.de" +``` + +## Sign your Git commits +Once you finished above configuration steps for the operating system of your +choice, you can start signing your Git commits. All you need to do is to add +the `-S` flag to your `git commit` command: +```console +$ git commit -S -m "Create my first signed commit" +``` + +To see and verify the signatures, there is a `--show-signature` option to `git log`: + +```console +$ git log --show-signature -1 +commit ac4d8f716fcdaec5617a49caa850cfafec7e947c (HEAD -> 138-blog-post-s-mime-signing-of-git-commits) +gpgsm: Signature made 2020-04-09 06:26:53 using certificate ID 0xBBD386A3 +gpgsm: Good signature from "/CN=Huste, Tobias/O=Helmholtz-Zentrum Dresden - Rossendorf e. V./L=Dresden/ST=Sachsen/C=DE" +gpgsm: aka "t.huste@hzdr.de" +Author: Tobias Huste <t.huste@hzdr.de> +Date: Thu Mar 5 09:01:33 2020 +0100 + + WIP: Draft S/MIME blog post +``` + +Signing all commits by default can be enabled by setting the configuration +variable `commit.gpgsign` to `true`: + +```console +$ git config --global commit.gpgsign true +``` + +## Support on GitHub and GitLab.com +Currently, both GitHub and GitLab.com officially support S/MIME. +Both platforms display a green _Verified_ button beneath a signed commit in +case of a verified signature. +Otherwise a button showing _Unverified_ is displayed. +For self-hosted GitLab instances at least version +[`12.8.7`](https://about.gitlab.com/releases/2020/03/16/gitlab-12-8-7-released/) +is required. + +{:.treat-as-figure} + +Verified S/MIME signature on Github. + +{:.treat-as-figure} + +Verified S/MIME signature on GitLab. + +## Updates +### 2020-05-12 +We were notified that in some combinations of operating system and Git version +it is necessary to explicitly tell Git which program it should use for signing. +To do this, set the configuration variable `gpg.program` explicitly as shown +below. + +```console +$ git config --global gpg.program gpgsm +``` + +Thank you very much for notifying us! + +<div class="alert alert-success"> + <h2 id="contact-us"><i class="fas fa-info-circle"></i> Contact us</h2> + <p> + Do you have questions? Did one of the instructions stop working? + Tell us, we want and we need your feedback! + </p> + <p> + Write a mail to + <strong> + <a href="mailto:{{ site.contact_mail }}">{{ site.contact_mail }}</a> + </strong> + or + <strong> + <a href="https://gitlab.hzdr.de/hifis/software.hifis.net/-/issues/new?issue">open an issue</a> + </strong> + on <i class="fab fa-gitlab"></i> GitLab. + </p> +</div> + +[dfn-pki]: https://www.pki.dfn.de/ueberblick-dfn-pki/ +[git-scm]: https://git-scm.com/downloads +[smimesign-mac]: https://github.com/github/smimesign#macos +[smimesign-windows]: https://github.com/github/smimesign#windows diff --git a/_posts/2020/04/2020-04-17-online-swc-at-hzdr.md b/_posts/2020/04/2020-04-17-online-swc-at-hzdr.md new file mode 100644 index 000000000..6cecc4ee2 --- /dev/null +++ b/_posts/2020/04/2020-04-17-online-swc-at-hzdr.md @@ -0,0 +1,312 @@ +--- +title: "Our First Online SWC Workshop" +date: 2020-04-17 +authors: + - erxleben + - huste + - hueser +layout: blogpost +title_image: default +categories: + - report +tags: + - workshop +excerpt: + " + It was supposed to be our first Software Carpentries workshop at the HZDR. + We were in full swing organizing a live event until it became clear that we + would have to move online. <em>Challenge accepted!</em> + " +--- + +### Contents +{: .no_toc} + +1. TOC +{:toc} + + + +# Planning Phase + +Our first own [Software Carpentry](https://software-carpentry.org/) workshop was +supposed to be a live event. +We intended to take it easy, learn a few lessons and then build upon these. +With these goals in mind we set out to plan an two-day workshop for the +31st of March and the 1st of April 2020. +In the beginning of March it became clear that the effects of the Covid-19 +pandemic would reach us long before this date. +It was unanimously decided to switch the workshop to an online event instead of +cancelling it — even though this would mean a lot of organizational work with an +increasingly tight deadline. + +## Our Original Approach + +As it was expected to be a first experience for us as instructors, organizers +and helpers, we advertised 25 workshop seats on the PhD mailing list of +our institute. +To our complete surprise the event became booked out within the day. +As a reaction we created a second "event" in our system to act as a waiting +list and to be worked off in follow-up workshops. +The side effect was that we got a first glimpse of the huge demand for training +opportunities by our scientific staff. + +Our initial plan was to split the first workshop day equally between the +_Shell_ (the [_Bash_](https://en.wikipedia.org/wiki/Bash_(Unix_shell)) to be +precise) and _Git_ lessons and use the complete +second day for _Python_. + +We did however estimate that we might have to cut later episodes from these +lessons, depending on the learners' speed. + +## Switch to Online + +When we decided to switch to an online event it was clear that there are +additional unknowns to be expected. +For this reason we decided to reduce the number of participants for the first +iteration of the workshop to only 9 persons. +As we had no prior experience with online teaching it seemed better to start +with a conservative amount of participants and increase the number in future +workshops if everything went well. + +Therefore we set up a separate event in our system and transferred the planned +amount of participants over, based on the first-come-first-serve principle. +The remaining participants will be enrolled with priority into a follow-up +workshop in April 2020. + +Additional time had to be spent on selecting and organizing a suitable +video-conferencing system. + +{:.treat-as-figure} + +An example setup. The second monitor proved very useful. + +# Role Call + +While in the live-setting one _instructor_ is supported by one or multiple +_helpers_, the online environment also demands one of the helpers to +additionally take on the role of the _host_. +The roles can change from lesson to lesson since especially the instructor has a +high cognitive load and may be subject to a strained voice after prolonged +periods of teaching. + +## Instructor +The primary task of instructors is to lead the workshop. +They present the content of the teaching material, determine the pace of the +workshop and coordinate with the hosts and helpers. +Thus, it is especially important for them to familiarize with the workshop +materials and manage the time required for teaching episodes, exercises and +breaks. + +## Host +Running a workshop also requires to complete a lot of organizational side-tasks +during the event. +The nature of these tasks changes significantly when switching from a live event +to an online workshop. +Notably, video-conferencing tools tend to determine one person as _host_ who has +the full management rights for the event. +To reduce the instructors' workload, a seperate person fulfils the role of the +_host_ who can take over these tasks during the session: + +* Prepare and open breakout rooms +* Monitor the chat (together with the helpers) +* Keep an eye on the time +* Observe the participants reactions +* Organize quick polls for feedback or exercises +* Manage shared documents and insert exercise questions on demand + +In general the host is less focused on the participants but on the instructor +and helpers, taking note of the lesson progress and anticipating required +organizational actions. + +## Helper +Helpers are the backbone of a successful workshop. +They monitor the participants and proactively interact with participants that +have questions, may fall behind or have technical issues. +Questions may either be answered by helpers directly or be forwarded to the +instructor in an opportune moment if they are of more general concern. + +## The Workshop Team +We split our workshop into the three parts _Shell_, _Git_ and _Python_ between +our three instructors. +The two instructors who were not actively teaching, assumed the roles of host +and helper respectively. +We further expanded our team by two helpers which allowed us to +respond to questions without delay. + +# The Tools + +The choice of tools significantly affects the organizational effort and workshop +quality perceived by the participants. +In the following our selected tools will be shortly introduced. + +## Indico + +We employed [our self-hosted _Indico_ instance](https://hifis-events.hzdr.de) +as the event planning and registration tool. +It proved to be a good choice to facilitate the registration procedure and +allows to message selected (or all) event participants directly which turned out +to be very useful when switching the workshops to the online version. + +One drawback was the limited capability to transfer registrations from one event +to another, which had to be done manually, since the provided _export_ and +_import_ features did not support a common data layout. + +> [Official Indico Website](https://getindico.io/) + +## GitLab + +It appeared to be a good idea to extend the _Git_-lesson and also give a quick +look at _GitLab_ as an example of a well-known web-based collaborative software +life-cycle tool. +Thereby, the participants were able to apply their acquired _Git_ knowledge to +the User-Interfaces of _GitLab_. +As most of our participants were members of the HZDR and we also had the +sufficient administrative rights to allow access for all other participants, we +chose to use the [institute-local _GitLab_ instance](https://gitlab.hzdr.de) +for this purpose. +In future workshops with participants from other institutions we might switch to +[_gitlab.com_](https://www.gitlab.com) for this exercise. + +It is worth mentioning that people who signed in via a third-party provider need +to use an access-token when cloning via _https_. +This can also be the case on _gitlab.com_ and forces the organizers to plan some +time to get this set up for all affected participants. + +## HackMD + +Amongst the many collaborative online editors that were availabe we chose +_HackMD_ for its plentiful characteristics: + +* Ease-of-use +* Markdown formatting capabilities +* Code / syntax highlighting +* Side-by-side editing and preview + +Even though our participants had no previous experience with markdown documents, +they quickly adopted its basics. +Some exercises required the solutions to be put into code blocks or tables which +were either copied and pasted from prepared examples or formatted by the +helpers. + +> [Official HackMD Website](https://hackmd.io) + +## Zoom + +The choice for a video-conferencing tool was probably the most important +decision during the switching to online. +We were already familiar with Zoom from the Carpentry instructor lessons and had +the good fortune to be offered a share in a Zoom-Account by another member of +the Carpentries for the purpose of holding workshops until we could organize our +own account. +There was not enough time to get a _BigBlueButton_ or _Jitsi_-instance +installed and evaluated properly. + +During the workshop we could make good use of the offered features and +experienced good video and audio quality. +We prefixed the screen names of helpers and instructors with their respective +role to make them easier to distinguish by name alone. + +In the light of rising security and data protection concerns regarding Zoom we +continue to monitor the situation and keep exploring alternatives with the aim +to offer the best possible workshop experience to our participants in a +privacy-friendly way. + +## IPython + +For teaching the basics of _Python_ we went with _IPython_. +It offers syntax highlighting to aid the learner. +Since it is an interpreter, the participants get instant feedback if the entered +line is valid python. +The command-line based approach significantly reduces the amount of objects on +the screen and aids to focus the learners attention on the code itself. + +The tool comes with the standard installation of the _Anaconda_ packages as +recommended in the setup instructions by the Carpentries. + +# How it Went + +It became clear early on that an online workshop progresses notably slower +in comparison to a live version. +Due to the lack of non-verbal communication, it is more often required to assess +the learners progress and also, in the case of helpers, to interact with each +learner individually. +Sometimes communication can also be impeded by low quality audio transfers +making it necessary to repeat parts of sentences or write things down in a chat +or shared document. +We enjoyed a mostly stable connection with good quality and the participants +were very cooperative and disciplined, muting themselves if they did not wish to +speak. + +To encourage users to use the shared document and signal that they are allowed +and expected to modify it, we included a warm-up exercise which asked the +participants to note down their attendance together with a short statement about +themselves. +"What is your favourite cake?" turned out to be a suitable icebreaker question. + +The first quarter of the second day was used for a quick recap of the _Git_ and +_Shell_ lessons, followed by the _Git_ collaboration exercise with the aid of a +_GitLab_ repository. + +The _Python_ part did not progress as fast as intended and led to the lessons +being compressed towards the end. + +## Issues Encountered + +### Operating System Differences +{: .no_toc} + +The participants were, with one exception, using Windows. +While this required some additional research effort for the helpers on some +occasions it kept the overall effort low, since participants could also help +each other out if necessary. + +Particular issues, for example accomodating for the different line ending +encodings, that would arise were also covered by the Carpentries' lecture +materials and could thus quickly be solved. + +### Handling More Challenging Problems +{: .no_toc} + +Two more complex issues could be solved by the host assigning the participant in +question to a breakout room together with a helper. +This way the lessons could progress for the unaffected participants and the +breakout room was joined back once the issues were resolved. + +From the experiences made we would reserve such procedures only for the most +dire of problems since the involved learners lose the connection to the workshop +and may have trouble getting back on track. + +It is preferrable to try helping in the shared online environment first, to +allow the other participants to either learn something new for themselves or +contribute on their own. + +## Feedback, Reactions and Lessons Learned + +The post-workshop survey determined that the participants viewed the event in a +positive light. +Compared to the pre-workshop survey, most participants felt enabled to make +first steps towards improving their workflows. + +A general consensus between participants and organizers alike was the need to +plan more time for the workshop content as well as the demand for follow-up +workshops covering more advanced topics. + +As organizers we consider this first event a success and a good foundation upon +which to build our future events. + +## Acknowledgements + +{% assign instructors="steinbach / huste / erxleben" | split: " / "%} +{% assign helpers="Lokamani / hueser" | split: " / "%} + +We want to thank our **instructors** +{% include team_card_mini_bundle.html persons=instructors %} + +and **helpers** +{% include team_card_mini_bundle.html persons=helpers %} + +for organizing and holding the workshop, as well as our **participants** for +being awesome learners. + +_Until next time!_ diff --git a/_posts/2020/04/2020-04-20-ML-hackathon.md b/_posts/2020/04/2020-04-20-ML-hackathon.md new file mode 100644 index 000000000..a87032d2b --- /dev/null +++ b/_posts/2020/04/2020-04-20-ML-hackathon.md @@ -0,0 +1,56 @@ +--- +title: "Experience Report: Machine Learning Hackathon at GFZ" +date: 2020-04-20 +authors: + - dolling +layout: blogpost +title_image: default +categories: + - report +tags: + - hackathon +excerpt: + " + In March the first machine learning hackathon was held at GFZ Potsdam. + Here I give an overview from the organizers point of view. + " +--- + +The Machine Learning (ML) Group of the GFZ was lucky that one of the last events that happened in person was our **machine learning hackathon**. +It was held on March 5th, 2020 and organized by the ML Interest Group that formed at GFZ in 2019. +The group mainly consists of researchers from different domain of Earth sciences, having common interests in ML state of the art techniques and tools. +The hackathon idea came through a survey we conducted among its members. + +Okay, now you might be wondering how we set it up. +Before I give you the details, I want you to keep in mind that it was our first time hosting a hackathon. +We had a list of wishes: + 1. We wanted it to be family friendly and flexible + 2. Our members cover experience ranges from beginner to advanced, so we wanted to be inclusive of everyone + 3. We also wanted the hackathon to be a one day event. + +Once we had clarified our wish list, we needed to focus on how to make it happen. +Our goal was to find a set of interesting ML problems that benefit our participants and are still feasible to be addressed within the hackathon's time-frame. +So we approached participants to submit problems themselves. +Keeping in mind our participants' mixed experience levels from novices to experts, we then selected submitted challenges that were suitable to all of them. + +The first hackathon challenge was a shearing problem in tectonics given a time series dataset for seismic activities. +The participants where free to explore the data as they wished. +The second problem was about finding out if spectral satellite data can be used to understand soil composition. + +On the day of the hackaton, thirty participants joined us. +The chosen problems were initially presented by participants who proposed them, then the remaining participants were given the chance to choose the problem that interested them. +Afterwards, we organized three to six people each into a group to hack a solution of their choice. +As the hackaton was intended to discuss, share ideas and come up with a solution, it was not a competitive environment. +We experienced a friendly atmosphere where people freely discuss and even had pizzas and drinks. + +At the end of the day, all groups presented their results and also uploaded it to our [GitLab group][MLGFZ]. +It was great to see different approaches for a solution from different groups. +One of the interesting outcomes was that people from different research domains found out that they are using similar methodology which opened up ways for future collaborations. +This shows that such events can have a long term impact. + +I was happy to be part of such interesting and collaborative environment! + +Of course, we always aim to improve such events. +For this purpose we are looking at participant's feedback to take into account for our next event. + +[MLGFZ]: https://gitext.gfz-potsdam.de/ml-gfz diff --git a/_posts/2020/06/2020-06-01-helpdesk-launch.md b/_posts/2020/06/2020-06-01-helpdesk-launch.md new file mode 100644 index 000000000..f8cb81513 --- /dev/null +++ b/_posts/2020/06/2020-06-01-helpdesk-launch.md @@ -0,0 +1,51 @@ +--- +title: "HIFIS Software Helpdesk launched" +date: 2020-06-01 +authors: + - ziegner +layout: blogpost +title_image: default +categories: + - announcement +tags: + - general + - support +excerpt: + " + We are happy to announce the launch of the HIFIS Software Helpdesk - the + place where your user requests for HIFIS Software Services are handled by + our distributed team. + " +--- + +We are happy to announce the launch of the +**[HIFIS Software Helpdesk][helpdesk]** - the place where +_your_ user requests for HIFIS Software Services are handled by our distributed +team. +The helpdesk solution is based on the open source helpdesk/customer +support system [Zammad][zammad]. + +- You need more support for your scientific software project? +- You have software related questions you want answered? +- You are looking for a specific workshop? +- You can’t find exactly what you’re looking for? + +Let us know and create a ticket in our shiny new [helpdesk][helpdesk] or +send an email to [{{ site.contact_mail }}][contact_mail], which will +result in the same. +Tickets aren't sent to one individual person, but to the whole team. +They are then forwarded to the most relevant expert and will stay open until +they are resolved. + +## Do I need to create an account for the helpdesk? +No, since the helpdesk is registered within the Helmholtz Authentication and +Authorization Infrastructure (AAI) you can simply sign in using your personal +access data of your home institution or even your personal [GitHub][github] +account. +All you have to do is to use the respective login button for the Helmholtz +AAI sign-in method on the login page. + +[contact_mail]: mailto:{{ site.contact_mail }} +[github]: https://github.com/ +[helpdesk]: https://{{ site.helpdesk }}/ +[zammad]: https://zammad.org/ diff --git a/_posts/2020/06/2020-06-11-HackyHour.md b/_posts/2020/06/2020-06-11-HackyHour.md new file mode 100644 index 000000000..d4818f364 --- /dev/null +++ b/_posts/2020/06/2020-06-11-HackyHour.md @@ -0,0 +1,62 @@ +--- +title: "Coding, Cookies, COVID-19" +date: 2020-06-11 +authors: + - dolling + - dworatzyk +layout: blogpost +title_image: default +categories: + - report +tags: + - hacky hour +excerpt: + " + The first HIFIS Hacky Hour was held online at + <a href=\"https://meet.jit.si/hifishackyhour\">jitsi</a>. + Here we give an overview from the organizers point of view. + " +--- + +### What the hack is a Hacky Hour? +There are different answers to this question. +An excellent one was provided by [Amanda Miotto][hackyhourhandbook]. +We were forced by current events to find another. +In the original idea, a Hacky Hour is an informal meet-up for scientists to discuss research software and tools. +It also serves to get answers to technical questions in a cozy café-like environment. +Inspired by this, a similar event was being planned within HIFIS. +Due to the COVID-19 pandemic we needed to adapt this concept a little bit. +Instead of a face-to-face get-together with cookies and coffee (or tea), we set-up a virtual meeting via Jitsi. +But when one door closes, another door opens: This virtual format allowed us to bring together researchers from different Helmholtz centers in a joint meet-up. +The _HIFIS Hacky Hour_ was born. + +### Why did we need a Hacky Hour? +Most areas of software development build on a strong community, for example [FOSS][FOSS] or forums like _Stack Overflow_. +Within the Helmholtz Association, however, opportunities to exchange experience with other researchers and [research software engineers][RSE] about software related topics are rather rare. +In particular in view of the current home office situation, we had the feeling that there was an increased need for sharing ideas, socializing, and support with individual software projects. +To fill this gap and to improve the situation, we adapted the Hacky Hour idea and organized a (bi-)weekly virtual meeting. + +### How exactly did we organize the Hacky Hour? +Previous attempts to establish one quickly evolved into a mini-lecture-series. +This approach died off rather quickly. +But the idea of a _Hacky Hour_ survived. +In this earlier approach certain topics (chosen by the organizers) did not raise much interest, so we tried a more community-driven approach: +* While the first topic ideas were suggested by us and a random one was picked for the first meeting, in the following sessions possible topics were collected in a _CodiMD_-like [pad][metapad]. +* In this pad, participants could add their topic suggestions and vote for the next topic at the end of each Hacky Hour. +* The upcoming topic was chosen by the last meeting's participants (or randomly if there was no vote), ensuring that people were interested in the next session's topic. +* We then came together to discuss the topic and tried to answer questions that were raised by participants as a hive mind. +By engaging participants in answering questions and in the planning of the next session, we try to avoid a simple helpdesk-situation. + + + +### First impressions +When we set up the first meeting rather spontaneously taking an impromptu approach, we had no expectations (or practical experience) at all. +It was reasoned that if nobody showed up, it would not have been a big waste of effort. + All we had to do was setting up a virtual meeting room and e-mailing invitations to thousands of researchers (which is currently pretty cheap and time efficient). +And surprise: Of about 1500 e-mail recipients nine persons (whoa - an unexpectedly great response!) showed up for the first meeting to discuss the topic +_Public Money – Public Code_. + +[hackyhourhandbook]: https://github.com/amandamiotto/HackyHourHandbook +[FOSS]: https://fsfe.org/freesoftware/comparison.en.html +[RSE]: https://de-rse.org/en/ +[metapad]: https://pad.gwdg.de/0HczFKgqS_C9L1QGzfpbJA?both diff --git a/_posts/2020/08/2020-08-19-introducing-consulting.md b/_posts/2020/08/2020-08-19-introducing-consulting.md new file mode 100644 index 000000000..dd9c93dc8 --- /dev/null +++ b/_posts/2020/08/2020-08-19-introducing-consulting.md @@ -0,0 +1,68 @@ +--- +title: "Introducing: Consulting Services" +date: 2020-08-19 +authors: + - frere +layout: blogpost +title_image: headway-5QgIuuBxKwM-unsplash.jpg +categories: + - announcement +tags: + - general + - consulting +excerpt: > + HIFIS Software Consulting Services offers free-of-charge consulting services to all research groups within the Helmholtz umbrella. + Find out what that means, and how you can get software support for your project. +redirect_from: + - announcement/2020/08/19/introducting-consulting/ +--- + + +# Consulting Services + +As programming and coding increasingly become important tools in the scientist's toolbox, +it also becomes increasingly common to run into bigger software issues. +Issues like: + +* How do I get this old library to run on our new HPC cluster? +* How do I release my simulation tool so that it can be reviewed along with the rest of my paper? +* How can I track my code, dependencies, and data so that I can be sure that people can use this project even after I've finished working with it? + +HIFIS Software Consulting Services offers free-of-charge consulting services to all research groups within the Helmholtz umbrella. +Our aim is to provide software development experts who can answer questions and provide support for all topics in the areas of research and software. + +## Case in Point + +When I joined the HIFIS team, one of my first consultation projects was at HZDR, +with a group whose code would only run on one particular server that they could not continue maintaining. +By the end of our involvement, we had: + +* Set up _CMake_ to manage the project's dependencies; +* Set up their project (via Gitlab) to automatically build the code every time the team made a change, so that they could see quickly if something was breaking; +* Made the software run on the HZDR-wide HPC infrastructure; +* Documented the installation process so that future team members would know how to get started quickly. + +In other cases, we've been able to +look into data protection issues for a team setting up a website for their project, +and advise a spin-off company on technology choices. + +## Get Started + +If any of this sounds like something that might help you or your project, get in touch! +HIFIS Consulting is free, and available to any group or team that falls under the Helmholtz umbrella. +All you need to do is fill out the form on our [**consulting page**][hifis/consulting], +and we will be in touch as soon as possible. + +[hifis/consulting]: {% link services/consulting.html %} + +<br /> + +<div class="alert alert-success"> + <h2 id="contact-us"><i class="fas fa-info-circle"></i> Get In Touch</h2> + <p> + If you work for a Helmholtz-affilliated institution, and think that something like this would be useful to you, send us an e-mail at + <strong><a href="mailto:{{site.contact_mail}}">{{site.contact_mail}}</a></strong>, + or fill in our + <strong><a href="{% link services/consulting.html %}#consultation-request-form">consultation request form</a></strong>. + </p> +</div> diff --git a/_posts/2020/09/2020-09-23-getting-started-with-docker-1.md b/_posts/2020/09/2020-09-23-getting-started-with-docker-1.md new file mode 100644 index 000000000..fa93f05ef --- /dev/null +++ b/_posts/2020/09/2020-09-23-getting-started-with-docker-1.md @@ -0,0 +1,293 @@ +--- +title: "Docker For Science (Part 1)" +date: 2020-09-23 +authors: + - frere +layout: blogpost +title_image: docker-cover.png +categories: + - tutorial +tags: + - consulting + - docker +excerpt: > + Understanding Docker probably won't solve all of your problems, + but it can be a really useful tool when trying to build reproducible software that will run almost anywhere. + In this series of blog posts, we will explore how to setup and use Docker for scientific applications. +--- + +{:.summary} +> This post is part of a short blog series on using Docker for scientific applications. +> My aim is to explain the motivation behind Docker, +> show you how it works, +> and offer an insight into the different ways that you might want to use it in different research contexts. +> +> Quick links: +> +> - Part 1 (Getting Started with Docker) ← You are here! +> - [Part 2 (A Dockerfile Walkthrough)]({% post_url 2020/09/2020-09-25-getting-started-with-docker-2 %}) +> - [Part 3 (Using Docker in Practical Situations)]({% post_url 2020/09/2020-09-29-getting-started-with-docker-3 %}) + +Understanding Docker probably won't solve all of your problems, +but it can be a really useful tool when trying to build reproducible software that will run almost anywhere. +Unfortunately, a lot of existing tutorials are aimed primarily at web developers, backend engineers, or cloud DevOps teams, +which is a pity, because Docker can be useful in much wider contexts. +This series explains what Docker is, how to use it practically, and where it might be useful in the context of scientific research. + +# What is Docker? + +One of the key challenges in modern research is how to achieve reproducibility. +Interestingly, this is also a big interest for software development. +If I write some code, it should work on my machine +(I mean, I hope it does!) +but how do I guarantee that it will work on anyone else's? +Similarly, when writing code to analyse data, it is important that it produces the correct result, +not just when you run the code multiple times with the same input data, +but _also_ when someone else runs the code on a different computer. + +{:.treat-as-figure} +{:.float-left} +[](https://xkcd.com/1987/) +The complexity of Python environments, as explained by XKCD (Comic by Randall Munroe -- [CC BY-NC 2.5](https://xkcd.com/license.html)) + +One of the common ways that software developers have traditionally tried to solve this problem is using virtual machines (or VMs). +The idea is that on your computer, you've probably got different versions of dependencies that will all interact in different messy ways, +not to mention the complexity of packaging in languages like Python and C. +However, if you have a VM, you can standardise things a bit more easily. +You can specify which packages are installed, and what versions, and what operating system everything is running on in the first place. +Everyone in your group can reproduce each other's work, because you're all running it in the same place. + +The problem occurs when a reviewer comes along, who probably won't have access to your specific VM. +You either need to give them the exact instructions about how to setup your VM correctly +(and can you remember the precise instructions you used then, and what versions all your dependencies were at?) +_or_ you need to copy the whole operating system (and all of the files in it) out of your VM, into a new VM for the reviewer. + +Docker is both of those solutions at the same time. + +Docker thinks of a computer as an _image_, which is a bundle of _layers_. +The bottom layer is a computer with almost nothing on it[^1]. +The top layer is a computer with an operating system, all your dependencies, and your code, compiled and ready to run. +All the layers between those two points are the individual steps that you need to perform to get your computer in the right state to run your code. +Each step defines the changes between it and the next layer, +with each of these steps being written down in a file called a _Dockerfile_. +Moreover, once all of these layers have been built on one computer, they can be shared with other people, +meaning that you can always share your exact setup with anyone else who needs to run and review the code. + +When these layers are bundled together, we call that an image. +Finally, to run the image, Docker transforms it into a container, +and runs that container as if it were running inside a virtual machine[^2]. + +[^1]: + This is a bit of a simplification. + The canonical base image ("scratch") is a zero-byte empty layer, + _but_, if you were able to explore inside it, + you'd find that there is still enough of an operating system for things like files to exist, and to run certain programs. + This is because Docker images aren't separate virtual machines -- + the operating system that you can see is actually the operating system of the computer that's running Docker. + This is a concept called _containerisation_ or _OS-level Virtualisation_, and how it works is very much beyond the scope of this blog post! + +[^2]: + The differences between layers, images, and containers is not always obvious, and I had to look it up a lot while writing this post. + Most of the time, it's possible to think of layers and images being the same thing, and containers being the way that you run the final layer. + However, this isn't technically accurate, and can cause some confusion when exploring container IDs, image IDs, and layer IDs. + If you want to explore this more, I recommend reading Sofija Simic's post [here](https://phoenixnap.com/kb/docker-image-vs-container), + followed by Nigel Brown's post [here](https://windsock.io/explaining-docker-image-ids/). + + Please remember that none of the above information is necessary to truly use and understand Docker -- + the main reason that I ran into these questions was when trying to get a completely solid understanding of what different IDs referred to while writing this post. + Most of the time, these specifics are completely transparent to the user. + +# Setting Up Docker + +Setting up Docker will look different between different operating systems. +This is to cover certain cross-platform issues. +Basically, as a general rule, in any given operating system, it's only possible to run containers that _also_ use that same operating system. +(Linux on Linux, Windows on Windows, etc.)[^3] +Obviously this is very impractical, given that most pre-built and base layers available for Docker are built for Linux. +As a result, for Windows and MacOS, Docker provides a tool called Docker Desktop, +which includes a virtual machine to basically paper over the differences between Linux and the host operating system[^4]. +It also provides a number of other tools for more advanced Docker usage that we won't go into now. + +For Linux, you will need to install "Docker Engine" -- +this is essentially just the core part of Docker that runs containers. + +The installation instructions for Mac, Windows, and Linux are available at the [Get Docker](https://docs.docker.com/get-docker/) page -- +if you want to follow along with the rest of these commands, feel free to complete those installation instructions, and then come back here. + +[^3]: + Why? + As I mentioned in the previous footnote, + containerisation isn't about creating new virtual machines -- + it's about running a mostly-sandboxed version of an operating system inside the parent operating system + (this is the _containerisation_ concept). + Because it's still running inside the same operating system as before, you can't switch between Linux and Windows. + +[^4]: Note that you can also use Windows Subsystem for Linux (WSL) instead of a "true" virtual machine. + +# Running Our First Docker Container + +The first step with any new programming language is the "Hello World" program -- +what does "Hello World" look like on Docker? + +```console +$ docker run hello-world +Unable to find image 'hello-world:latest' locally +latest: Pulling from library/hello-world +0e03bdcc26d7: Pull complete +Digest: sha256:7f0a9f93b4aa3022c3a4c147a449bf11e0941a1fd0bf4a8e6c9408b2600777c5 +Status: Downloaded newer image for hello-world:latest + +Hello from Docker! +This message shows that your installation appears to be working correctly. + +-text snipped for convenience- +``` + +The first thing we get when we run this docker command is a series of messages about what Docker is doing to run the `hello-world` container. + +1. First, Docker tries (and fails) to search the computer that it's running on for an already cached copy of a container called `hello-world:latest`. + The `:latest` part is called the tag, and roughly corresponds to the version of the relevant software that is installed on this container. + When no tag is specified, Docker defaults to "latest", which is usually the most recent build of a container. +2. Because it can't find the image, it "pulls" the image from an external repository -- in this case, [Docker Hub](https://hub.docker.com/search?q=&type=image). + The `hello-world` container is actually part of the "standard library" of official Docker images, which is where the `library/` part comes from. + Normally, if we were to host our own images on Docker Hub, we'd need to include a user or organisation namespace (e.g. `helmholtz/...`). +3. The line beginning with a set of random numbers and digits means that Docker is downloading a layer. + (The numbers and digits are an identifier for the file being downloaded.) + On slower computers, you might see a loading bar appear here while the actual download takes place. +4. The next two lines ("Digest" and "Status") are simply updates to say that everything has been downloaded and that Docker is ready to run the image. + The digest is a unique identifier for this exact image which will never be updated, + which can be useful if you want to be completely certain that you'll never accidentally update something. +5. Finally, a message is printed (this is the "Hello from Docker!" section). + This explains a bit about what has just happened, and confirms that everything was successful. + +# Running Our Second Docker Container + +The "Hello World" operation runs, but it doesn't actually do much useful -- +let's try running something more interesting and useful. +Part of our original motivation for this exercise was managing the chaos of different ways of installing Python and its dependencies, +so let's see if we can get a container up and running with Python. + +The first step is generally to find a Python base image. +Thankfully, as part of the set of officially maintained images, Docker provides some Python images for us to use. +This includes images for different versions of Python. +Whereas last time, we used the default `latest` tag, this time we can try explicitly using the 3.8.5 tag to set the Python version. + +However, if we try running this, we'll run into a bit of an issue: + +```console +$ docker run python:3.8.5 +Unable to find image 'python:3.8.5' locally +3.8.5: Pulling from library/python +d6ff36c9ec48: Pull complete +c958d65b3090: Pull complete +edaf0a6b092f: Pull complete +80931cf68816: Pull complete +7dc5581457b1: Pull complete +87013dc371d5: Pull complete +dbb5b2d86fe3: Pull complete +4cb6f1e38c2d: Pull complete +c2df8846f270: Pull complete +Digest: sha256:bc765f71aaa90648de6cfa356ec201d50549031a244f48f8f477f386517c5d1b +Status: Downloaded newer image for python:3.8.5 +$ +``` + +If you run this, you'll immediately see that there are a lot more layers that need to be downloaded and extracted -- +this makes sense, as Python is a much more complicated piece of software than just print a "Hello World" message! +You'll also see that instead of `latest`, the tag is `3.8.5`, so we can be sure what version we are using. + +However, when we ran this image, the docker command immediately exited, and we're back to where we started. +We've downloaded _something_ -- but what does that something actually do? + +By default, when Docker runs a container, it just prints the output of that container -- +it doesn't send any user input into that container. +However, the default Python command is a REPL -- it require some sort of input to do something with. +To allow us to send terminal input in and out, we can use the `-it` flags, like this: + +```console?prompt=$,>>> +$ docker run -it python:3.8.5 +Python 3.8.5 (default, Sep 1 2020, 18:44:24) +[GCC 8.3.0] on linux +Type "help", "copyright", "credits" or "license" for more information. +>>> +``` + +That looks better! +Feel free to play around and convince yourself that this is a working, standard Python installation. +Pressing Ctrl+D will exit the terminal and close the container. +It's worth noting that the second time we ran this command, there was no information about pulling layers or downloading images. +This is because Docker caches this sort of information locally. + +# Running Our Second Docker Container (Again!) + +All Docker containers have a command that runs as the main process in that container. +With the "Hello World" container, that command was a small binary that prints out a welcome message. +With Python, the command was the standard `python` executable. +What if we want to run a different command in the same container? +For example, say we have a Python container, and we're using the Python interpreter. +Is there a way that we can open a shell on that container so that we can run commands like `pip` to install dependencies? + +The first thing we need to do is deal with a problem that we're about to run into. +When the main process in a container exits +(the "Hello World" command has printed all it needs to print, or the Python interpreter has been exited) +the whole container is closed. +This is mostly useful +(when the main process exits, we probably don't need the container any more) +but it does mean that we need to think a bit about how we're going to interact with the running container. + +Firstly, let's create a new container, but give it a special name (here `my-python-container`). + +```console?prompt=$ +$ docker run --name my-python-container -it python:3.8.5 +Python 3.8.5 (default, Sep 1 2020, 18:44:24) +[GCC 8.3.0] on linux +Type "help", "copyright", "credits" or "license" for more information. +>>> +``` + +Now, opening a second terminal (and _not_ closing the Python process in the first terminal), +we can use the `docker exec` command to run a second command inside the same container, as long as we know the name. +In this case, we can use `bash` as the second command, and from there we can `pip install` whatever we want. + +```console?prompt=$,# +$ docker exec my-python-container bash +root@f30676215731:/# pip install numpy +``` + +Pressing Ctrl-D in this second terminal will close bash and bring us out of this new container. + +We could also have directly run `docker exec my-python-container pip install numpy` -- +in this case, because we only wanted to run one command inside the container, it would have had the same effect. +However, opening up a bash terminal inside the container is a very useful ability, +because it's then possible to root around inside the container and examine what's going on -- +often helpful for debugging! + +# Next: Part 2 -- A Dockerfile Walkthrough + +In this post, I explained a bit about how Docker works, and how to use Docker to run Python +(and many other tools!) +in an isolated environment on your computer. +All the images that we used in this post were created by others and hosted on Docker Hub. + +In the next post, I'm going to explain how to create your own image, containing your own application code, +by going line-by-line through an example Dockerfile. +By creating an image in this way, we can clearly define the instructions needed to setup, install, and run our code, +making our development process much more reproducible. + +View part two [here]({% post_url 2020/09/2020-09-25-getting-started-with-docker-2 %}). + +<!-- doing spacing with html is fun... --> +<br /> + +<div class="alert alert-success"> + <h2 id="contact-us"><i class="fas fa-info-circle"></i> Get In Touch</h2> + <p> + HIFIS offers free-of-charge workshops and consulting to research groups within the Helmholtz umbrella. + If you work for a Helmholtz-affiliated institution, and think that this would be useful to you, send us an e-mail at + <strong><a href="mailto:{{site.contact_mail}}">{{site.contact_mail}}</a></strong>, + or fill in our + <strong><a href="{% link services/consulting.html %}#consultation-request-form">consultation request form</a></strong>. + </p> +</div> + +# Footnotes diff --git a/_posts/2020/09/2020-09-25-getting-started-with-docker-2.md b/_posts/2020/09/2020-09-25-getting-started-with-docker-2.md new file mode 100644 index 000000000..11ced35e2 --- /dev/null +++ b/_posts/2020/09/2020-09-25-getting-started-with-docker-2.md @@ -0,0 +1,322 @@ +--- +title: "Docker for Science (Part 2)" +date: 2020-09-25 +authors: + - frere +layout: blogpost +title_image: docker-cover.png +categories: + - tutorial +tags: + - consulting + - docker +excerpt: > + Previously, we learned about Docker, and how to run other people's Docker containers. + In this post, we will explore building our own images to package up our projects. +--- + +{:.summary} +> This post is part of a short blog series on using Docker for scientific applications. +> My aim is to explain the motivation behind Docker, +> show you how it works, +> and offer an insight into the different ways that you might want to use it in different research contexts. +> +> Quick links: +> +> - [Part 1 (Getting Started with Docker)]({% post_url 2020/09/2020-09-23-getting-started-with-docker-1 %}) +> - Part 2 (A Dockerfile Walkthrough) ← You are here! +> - [Part 3 (Using Docker in Practical Situations)]({% post_url 2020/09/2020-09-29-getting-started-with-docker-3 %}) + +# An Example Dockerfile + +Let's get straight to business: +Here's what an example Dockerfile for a simple Python project might look like. +(The comments are added to make it easier to reference later in this post.) + +```docker +# (1) +FROM python:3.8.5 + +# (2) +WORKDIR /opt/my-project + +# (3) +COPY . /opt/my-project + +# (4) +RUN pip install -r requirements.txt + +# (5) +ENTRYPOINT [ "python3", "main.py" ] +``` + +# Building Our Example Project + +First let's figure out how to turn this Dockerfile into a container that we can run. +The first step is to get the code -- +you can find it in [this repository](https://gitlab.com/hifis/templates/sample-docker-project) so you can clone it and follow along. + +The first step to getting this ready to run is `docker build`. +To build an image, you need a Dockerfile, a name for the image, and a context. +The Dockerfile is what tells Docker how to build the image, +the name is what Docker will use to reference this image later (e.g. `python` or `hello-world`), +and the context is the set of files from your file system that Docker will have access to when it tries to build the project. + +Usually the context is the project directory (usually also the directory where the build command is run from). +Likewise, by convention, a Dockerfile is generally called `Dockerfile` (with no extension), +and lives in the project's root directory. +If this isn't the case, there are additional flags to pass to `docker build` that specify where it is located. +The name is given with the `-t` flag, also specifying any tags that you want to provide (as always, these default to `:latest`). +The `-t` flag can be provided multiple times, so you can tag one build with multiple tags, +for example if your current build should belong to both the `latest` tag, and a fixed tag for this release version. + +Having cloned the example repository, you can run this build process like this: + +```console?prompt=$,# +$ # builds the file at ./Dockerfile, with the current working directory as the context, +$ # with the name `my-analyser`. +$ docker build -t my-analyser . +Sending build context to Docker daemon 20.48kB +Step 1/5 : FROM python:3.8.5 +3.8.5: Pulling from library/python +d6ff36c9ec48: Pull complete +c958d65b3090: Pull complete +edaf0a6b092f: Pull complete +80931cf68816: Pull complete +7dc5581457b1: Pull complete +87013dc371d5: Pull complete +dbb5b2d86fe3: Pull complete +4cb6f1e38c2d: Pull complete +0b3d7b2fc317: Pull complete +Digest: sha256:4c62d8c5ef331e485143c7a664fd6deeea4595ac17008ef5c10cc470d259e39f +Status: Downloaded newer image for python:3.8.5 + ---> 62aa40094bb1 +Step 2/5 : WORKDIR /opt/my-project +Removing intermediate container 3e718c528a63 + ---> f6845bcf9e20 +Step 3/5 : COPY . /opt/my-project + ---> 8977a9a29d1c +Step 4/5 : RUN pip install -r requirements.txt + ---> Running in 8da06d6427d0 +Collecting numpy==1.19.1 + Downloading numpy-1.19.1-cp38-cp38-manylinux2010_x86_64.whl (14.5 MB) +Collecting click==7.1.2 + Downloading click-7.1.2-py2.py3-none-any.whl (82 kB) +Installing collected packages: numpy, click +Successfully installed click-7.1.2 numpy-1.19.1 +Removing intermediate container 8da06d6427d0 + ---> ba22084bd57e +Step 5/5 : ENTRYPOINT [ "python3", "main.py" ] + ---> Running in d1c9dc9bc09f +Removing intermediate container d1c9dc9bc09f + ---> d12d76ae371b +Successfully built d12d76ae371b +Successfully tagged my-analyser:latest +``` + +There are a few things to notice here. +Firstly, Docker sends the build context (that's the `.` part) to the Docker daemon. +We'll discuss the role of the Docker daemon a bit in the next post, but for now, the daemon is the process that actually does the work here. +After that, we start going through the steps defined in the Dockerfile +(you'll notice the five steps each match up to the five commands). +We'll go through what each command is actually doing in a moment, +although it might be interesting to get an idea for what each line is doing before reading onwards. + +Before we explore the individual commands, however, we should figure out how to actually run this compiled image. +The Python script that we're running is a fairly simple one -- +it has two commands, one to tell us how many items of data we've got, and another to give us the average values from that data. +We can run it like this: + +```console?prompt=$,# +$ docker run my-analyser +Usage: main.py [OPTIONS] COMMAND [ARGS]... + +Options: + --help Show this message and exit. + +Commands: + analyse-data + count-datapoints +$ docker run my-analyser count-datapoints +My Custom Application +datapoint count = 100 +$ docker run my-analyser analyse-data +My Custom Application +height = 1.707529904338 +weight = 76.956408654431 +``` + +This is very similar to the `hello-world` container that we ran, +except without any need to download anything (because the container has already been built on our system). +We'll look at transfering the container to other computers in the next post, +but, in principle, this is all we need to do to get a completely self-sufficient container containing all the code +that we need to run our project. + +For now, let's go through the Dockerfile step-by-step and clarify what each command given there is doing. + +# Step-by-step Through The Dockerfile + +The first thing **(1)** a Dockerfile needs is a parent image. +In our case, we're using one of the pre-built Python images. +This is an official image provided by Docker that starts with a basic Debian Linux installation, +and installs Python on top of it. +We can also specify the exact version that we want (here we use 3.8.5). + +There are a large number of these pre-built official images available, +for tools such as +[Python](https://hub.docker.com/_/python), +[R](https://hub.docker.com/_/r-base), +and [Julia](https://hub.docker.com/_/julia). +There are also unofficial images that often bring together a variety of scientific computing tools for convenience. +For example, the Jupyter Notebooks team have a [wide selection](https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html) of different images with support for different setups. +Alternatively, most Linux distributions, including Ubuntu and Debian[^1], are available as parent images. +You may need to do more work to get these set up +(for example, you'll need to manually install Python first) +but you also have more flexibility to get things set up exactly how you want. + +Once we've got our base image, we want to make this image our own. +Each of the commands in this file adds a new layer on top of the previous one. +The first command we use **(2)** is fairly simple -- it just sets the current working directory. +It's a bit like running `cd` to get to the directory you want to start working in. +Here, we set it to `/opt/my-project`. +It doesn't really matter what we use here, +but I recommend `/opt/<project-name>` as a reasonable default. + +The next step **(3)** is to add our own code to the image. +The image that we're building will be (mostly)[^2] isolated from the computer that we run it from, +so if we want our code to be built into this project, we need to explicitly put it there. +The `COPY` command is the way to do that. +It creates a new layer that contains files from our local system (`.`) in the location in the image that we specify (`/opt/my-project`). + +At this point, we have a python project inside a Docker image. +However, our project probably has some third-party dependencies that will also need to be installed. +As I pointed out before, the Docker container that we're aiming for is isolated from the computer that we will run it from, +which also means that any dependencies that we need must also be installed inside the container. +The `RUN` **(4)** command allows us to run arbitrary commands inside the container. +After running the command, Docker then creates a new layer with the changes that were made by the command that we ran. + +Here, we run the `pip` command to install all of our dependencies[^3]. +We load the dependencies from a file called `requirements.txt` -- +if you're not so used to this system, this is a way of defining dependencies in a reproducible way, +so that any future user can look through a project and see exactly what the will need to run it. +It's important to emphasize that Docker doesn't need to replace `requirements.txt`, CMake, or other dependency management tools. +Rather, Docker can work together with these and other tools to help provide additional reproducibility guarantees. + +The final part of our `Dockerfile` is the `ENTRYPOINT` command **(5)**. + +Part of the idea of Docker is that each Docker container does one thing, and it does it well. +(You might recognise the UNIX philosophy here.) +As a result, a Docker container should generally contain one application, +and only the dependencies that that application needs to run. +The `ENTRYPOINT` command, along with the `CMD` command tells Docker which application should run. + +The difference between the `ENTRYPOINT` and `CMD` is a bit subtle, but it roughly comes down to how you use the `docker run` command. +When we ran it in the previous post, we generally used the default commands set by the containers -- +for `hello-world`, the default command was the executable that printed out the welcome message, +while in `python`, the default command was the Python REPL. +However, it's possible to overwrite this command from the `docker run` command. +For example, we can run the Python container to jump straight into a bash shell, skipping the Python process completely: + +```console?prompt=$,# +$ docker run -it python:3.8.5 bash # note the addition of 'bash' here to specify a different command to run +root@f30676215731:/# +``` + +This ability to replace the default command comes from using `CMD`. +In the Python Dockerfile, there is a line that looks like `CMD python`, which essentially tells Docker +"if nobody has a better plan, just run the Python executable". + +On the other hand, the arguments to `ENTRYPOINT` will just be put before whatever this command ends up being. +(It is possible to override this as well, but it's not as common.) +For example, consider the following Dockerfile: + +```docker +FROM ubuntu:20.04 + +# using `echo` allows us to "debug" what arguments get +# passed to the ENTRYPOINT command +ENTRYPOINT [ "echo" ] + +# this command can be overridden +CMD [ "Hello, World" ] +``` + +When we run this container, we get the following options: + +```console?prompt=$,# +$ docker run echotest # should print the default value CMD value +Hello, World +$ docker run echotest override arguments # should print the overidden arguments +override arguments +$ docker run -it --entrypoint bash echotest # overrides the entrypoint +``` + +As a rule, I would recommend using `ENTRYPOINT` when building a container for a custom application, +and `CMD` when you're building a container that you expect to be a base layer, +or an environment in which you expect people to run a lot of other commands. +In our case, using `ENTRYPOINT` allows us to add subcommands to the `main.py` script that can be run easily from the command line, +as demonstrate in the opening examples. +If we'd used `CMD` instead of `ENTRYPOINT`, +then running `docker run my-analyser count-datapoints` would have just tried to run the `count-entrypoints` command in the system, +which doesn't exist, and would have caused an error. + +# Next: Part 3 -- Practical Applications in Science + +In this second of three parts, we've looked at an example project with an example Dockerfile. +We explored how to build and run this Dockerfile, +and we explored some of the most important commands needed to set up the Dockerfile for a project. + +In the final part, I want to explore some of the different ways that someone might use Docker as part of research. +For example, +how to distribute Docker containers to other places, +how to run Docker containers on HPC systems, +building Docker via Continuous Integration, +and other places where you might see Docker being used. + +View part three [here]({% post_url 2020/09/2020-09-29-getting-started-with-docker-3 %}). + +<!-- doing spacing with html is fun... --> +<br /> + +<div class="alert alert-success"> + <h2 id="contact-us"><i class="fas fa-info-circle"></i> Get In Touch</h2> + <p> + HIFIS offers free-of-charge workshops and consulting to research groups within the Helmholtz umbrella. + You can read more about what we offer on our + <strong><a href="{% link services/index.md %}">services page</a></strong>. + If you work for a Helmholtz-affiliated institution, and think that something like this would be useful to you, send us an e-mail at + <strong><a href="mailto:{{site.contact_mail}}">{{site.contact_mail}}</a></strong>, + or fill in our + <strong><a href="{% link services/consulting.html %}#consultation-request-form">consultation request form</a></strong>. + </p> +</div> + +# Footnotes + +[^1]: + If you're look deeper into Docker, you might notice that a distribution called "Alpine Linux" crops up a lot. + This is an alternative distribution that is specifically designed to be as light as possible. + This _can_ save a lot of space in docker images, _but_ it also comes with some additional complexities. + I recommend starting with a Debian-based distribution, particularly for Python-based projects, + and then switching to Alpine Linux later if you find that your docker images are getting too large to handle. + +[^2]: + "Mostly" is an important caveat here! + To usefully run a Docker container, we need to send some input in and get some sort of output out -- + this is mostly handled with command-line arguments and the console output of whatever runs inside Docker. + However, for some applications (less so scientific ones), + we will also want to access a service running inside the container, e.g. a webserver. + Alternatively, we may want to access files inside the container while running it, + or even allow the container to access files from the "parent" computer that's running it. + These things can all be enabled using different arguments to the `docker run` command. + + I'll talk a little bit more about some specifics here in the final part of this series, + where I'll also mention tools like Singularity (that you're more likely to run into on HPC systems), + and explain some of the limitations of these tools a bit more clearly. + +[^3]: + If you have a lot of different Python projects, + you might (rightly!) ask why I haven't used something like `virtualenv` to isolate the Python environment. + The answer is that, in this case, it's not really necessary. + The Docker image that we build will have isolation built-in -- + and not only for Python, but for all our other dependencies too. diff --git a/_posts/2020/09/2020-09-29-getting-started-with-docker-3.md b/_posts/2020/09/2020-09-29-getting-started-with-docker-3.md new file mode 100644 index 000000000..8e8832d15 --- /dev/null +++ b/_posts/2020/09/2020-09-29-getting-started-with-docker-3.md @@ -0,0 +1,349 @@ +--- +title: "Docker for Science (Part 3)" +date: 2020-09-29 +authors: + - frere +layout: blogpost +title_image: docker-cover.png +categories: + - tutorial +tags: + - consulting + - docker +excerpt: > + The final part in the "Getting Started with Docker" series: + Having explored how to use Docker in a general sense, + we will look at how Docker can be used practically in your day-to-day scientific work. +--- + +{:.summary} +> This post is part of a short blog series on using Docker for scientific applications. +> My aim is to explain the motivation behind Docker, +> show you how it works, +> and offer an insight into the different ways that you might want to use it in different research contexts. +> +> Quick links: +> +> - [Part 1 (Getting Started with Docker)]({% post_url 2020/09/2020-09-23-getting-started-with-docker-1 %}) +> - [Part 2 (A Dockerfile Walkthrough)]({% post_url 2020/09/2020-09-25-getting-started-with-docker-2 %}) +> - Part 3 (Using Docker in Practical Situations) ← You are here! + +After working through the previous two parts, the next big question is: What now? +We've downloaded other people's Docker images, and we've created our own Docker image -- +how do we get other people to download, and do something useful with, our Docker image? +Likewise, how do we use our Docker image in a wider set of circumstances, for example on HPC systems, or a shared team VM? + +This post is more specific to looking at scientific applications, +so we'll look mainly at using GitLab and Singularity for these purposes, +as these are the most commonly-used tools in scientific computing. + +# Sharing Docker Images + +The first problem is how to move Docker images from one computer to another, +so that they can be shared between members of a team, or even between one team and another. + +Over the last two posts, I mentioned Docker Hub, the main Docker container registry. +This is both the source of all official Docker images (`hello-world`, `python`, etc), +as well as the host of a number of unofficial, third-party images (`jupyter/scipy-notebook`, `rocker/shiny`)[^1]. +If you're working with truly open-source code, and want to share that with the wider Docker ecosystem, +you can create an account and upload images here in the same way that you might host Python packages on PyPI. + +However, in practice, a lot of scientific programming is hosted internally via your institution's own Git host, +and most scientific applications are fairly specific, +and probably not of huge use outside the purpose that they were developed for. + +For this purpose, a number of git-hosting tools (such as GitLab and GitHub) also include per-project Docker registries. +This means that you can build and save your Docker images in the same place that you keep your code. + +For the purposes of this post, I'll assume the use of GitLab, because is one of the most common options in Helmholtz institutions[^2]. +When enabled by your administrator, GitLab projects include a private container registry for each project. +You can set it up for your project by going to _Settings > General > Visibility, project features, permissions_. +This will enable a "Packages and Registries > Container Registry" option in the project sidebar, which will take you to an empty page, +because you probably don't have any images stored yet. + +How do you store an image here? +Let's start off by doing it manually, and then do it "properly" -- by which I mean get it to happen automatically. +If you want to follow along, create a new private repository that you can use as a sandbox, +and push the code from the previous post to play around with. + +[^1]: + Notice that all third-party images have two parts -- + a group/maintainer name (e.g. `jupyter`), + and a specific image name (e.g. `scipy-notebook`). + This is the main way that you can tell the difference between official and third-party images. + +[^2]: + Unfortunately, the second-most common code hosting option at Helmholtz, BitBucket, doesn't include a container registry. + You can check with your local administrators if they have a tool like Artifactory or JFrog available. + Alternatively, part of the evolution of the HIFIS project is to provide code hosting infrastructure across the whole Helmholtz community, + which will include access to package and container registries, + so please keep an eye out for more HIFIS news on this blog! + +## Saving Images -- The Manual Process + +In the top-right corner of this Container Registry page, there is a button that says "CLI Commands". +This will walk us through the main steps of getting the image that we generated earlier into this registry. +The first command it gives us is `docker login`, followed by an URL for the local registry. +Copying this into your terminal, and pressing enter, will either use your GitLab SSH key (if you're using one), +or it will ask for your username and password for your GitLab account. + +If you can set up SSH for your GitLab account, please do so -- +this means your password does not need to be stored on disk! +You can find more details out [here](https://docs.gitlab.com/ee/ssh/README.html, +or in the documentation for your local GitLab instance. + +Once we've logged in, we can move to the next step. +The suggestion given by GitLab is a command to build the image, +but we already built our image while we were learning about Dockerfiles in the previous post. +However, the project name used by GitLab is different to the one we used then -- +why? + +Well, in GitLab at least, the name of your image is a combination of the project name, and the group or user who owns that project. +For example, if your username is "user123", and you create a project called `docker-test` inside your personal area in GitLab, +your image will be called `user123/docker-test`. +In addition, Docker requires that if you use a registry that isn't the main Docker registry, you specify that registry as part of the image name. +So, in this case, you'll actually need to name your image `<registry-url>/user123/docker-test`, +where `<registry-url>` is whatever you used to log in in the previous step. + +This isn't a problem at all -- we can just run the build again, and because of Docker's clever caching mechanism, +we shouldn't even need to wait for the build to happen again. +We can just run the command that GitLab gave us in our project directory, and we get the renamed tag for free. + +The final step is to push the image -- +for this, we simply use the `docker push` command, giving it the image name that we just used. +When this is done, we should be able to refresh the container registry page, and see our image sitting there, +with a little note that says "1 Tag". +Clicking on the image will show that tag -- it should be the `latest` tag that Docker always defaults to if none is specified. +To upload a different tag, just specify it at the end of the image name -- +for example: `registry.hzdr.de/user123/docker-test:my-tag`. + +Hopefully, it's clear that the manual process here isn't too complicated -- +we login, we do the build as we did in the previous post, and we add a push command as well. +The most complicated part is the name, but we can get this from GitLab. +However, three manual commands may well be three more commands than we actually need -- +how does the automatic process compare, and is it simpler? + +## Saving Images -- The Automatic Process + +In GitLab, we can use *CI Pipelines* to make things happen automatically when we update our code. +Often, this will be building our project, +running a linter or typechecker (e.g. mypy), +or running any automatic tests associated with the project. +GitLab makes it fairly easy to use these pipelines to build and upload images to the container registry, +and the HIFIS team have created some templates that can be used to make this completely simple. + +To use pipelines, a project needs to have a file in it called `.gitlab-ci.yml`, which defines a series of *Jobs* that need to be executed. +In the base project directory, create this file, and type the following into it: + +```yaml +include: + # include the HIFIS Docker template, so that we can extend the predefined jobs there + - "https://gitlab.com/hifis/templates/gitlab-ci/-/raw/master/templates/docker.yml" + +stages: + - build # => build the dockerfile + - release # => upload images to the repository + +docker-build: + extends: .docker-build + stage: build + +# this will update the `latest` tag every time the master branch changes +release-latest: + extends: .docker-tag-latest + stage: release + needs: + - job: docker-build + artifacts: true +``` + +This creates a pipeline with two jobs, one which builds the docker image, and one which uploads it to the registry. +If you push this to the master branch and click on the _CI/CD > Pipelines_ tab in your GitLab project, +you should already be able to see the pipeline being executed. + +The documentation for this template is available [here](https://gitlab.com/hifis/templates/gitlab-ci/-/blob/master/docs/docker.md). + +## Sharing Images + +Having an image in a registry is one thing, but sharing it with other people is another. +Private GitLab projects will also have private registries, +which means that anyone else who wants to access the registry will need to log in to GitLab via Docker +(as we did in the manual saving process) +and have sufficient privileges in the team. + +However, there is another route. +GitLab also provides access tokens that can be given to people to allow them the ability to pull images from Docker, +but not to make other changes. +They don't even need to have a GitLab account! + +In a project's settings, under _Settings > Access Tokens_, there is a page where you can create tokens to share with other people. +These tokens are randomly-generated passwords that are linked to a particular project, that specify exactly what a person is able to access. +For the purposes of sharing a Docker image, the `read_registry` permission is enough -- +this will allow the bearer of the token to access the registry, but not push new images there, or access other project features. + +To create an access token, give the token a name to describe what it's being used for, +select the relevant permissions that you want to grant[^3], +and optionally give an expiry date, if you know that the token will only be needed until a certain time. +In response, GitLab will provide a string of letters, digits, and other special characters, +which can be copied and sent to the people who need to use it. + +To use this token, use the `docker login` command with your normal GitLab username, and the token provided. +For more information, see the documentation [here](https://docs.gitlab.com/ee/user/packages/container_registry/#authenticating-to-the-gitlab-container-registry). + +[^3]: + Selecting which permissions to grant is an interesting question of security design that we shouldn't go into too much here, + but a general guideline is "everything needed to do the job required, and not a step more". + That is, give only the permissions that are actually needed right now, not permissions that might be useful at some point. + + This probably doesn't matter so much in the scientific world, where open research is increasingly important, + but it's a good principle when dealing with computers in general. + Consider a badly-written tool (they do exist... 😉) that is designed to clean up images that aren't needed any more. + One mistake in the filter for deciding which images aren't needed any more, + and this sort of tool could rampage through all the registries that it is connected to, deleting everything it can see. + (This sort of thing happens more often than you would think - see + [this bug](https://github.com/MrMEEE/bumblebee-Old-and-abbandoned/issues/123) and + [this bug](https://github.com/ValveSoftware/steam-for-linux/issues/3671) and + [this bug](https://itsfoss.com/accidentally-deletes-company-wrong-command/) and + [this bug](https://www.wired.com/2001/11/glitch-in-itunes-deletes-drives/) -- + all just from one small `rm` command!) + By limiting the access scope to read-only, we can limit how much these sorts of problems affect us. + At least until we decide to run this particularly (thankfully fictional) clean-up tool ourselves, + and make the same mistake... + +# Docker for HPC (High Performance Computing) + +Once you've got a Docker image that you can build and run on your computer, it makes sense to look for more useful places to run this image. + +In research software, it's common to run programs on an HPC, or High Performance Computing system. +This is a shared cluster of high-performance servers, often equipped with GPUs, managed centrally by a research institute +where users can run their programs for longer periods of time (several hours, or even days) without having to keep their own computers running. +Generally, the user will log on, schedule a job, and then be notified when their job has finished running so they can view the results. + +Unfortunately, for very good reasons, HPC administrators are often very reluctant to install Docker on their systems. +One of the side-effects of the way that Docker works +is that it is generally possible for a Docker image running on a server to gain administrator access on that parent server, +essentially "breaking out" of the container. +This makes the administrator's job much more difficult in terms of locking down each user's processes and isolating them from each other. +As a result, it's generally not a good idea to run Docker in this way. + +However, surprisingly, Docker isn't the only way to run Docker images. +There are a number of other tools used to do containerisation, +and one tool particularly that is both designed to run on HPC systems, +_and_ can interoperate with Docker, meaning you can usually just run your Docker image just like normal. + +This tool is called [_Singularity_](https://sylabs.io/guides/3.6/user-guide/introduction.html). +It is actually a complete containerisation tool in its own right, +with its own format for defining containers, +and its own way of running containers[^4]. +More importantly, it knows how to convert other container formats (including Docker) into its own `.sif` format. +In addition, it runs as the current user -- +it doesn't require any magical higher privileges like Docker. +(This is a trade-off, but for the purposes of scientific applications, it's usually a reasonable one to make.) + +If you want to install Singularity and try it out yourself, +you will need a Linux machine and a Go compiler, along with a few other dependencies. +You can find the full instructions [here](https://sylabs.io/guides/3.6/admin-guide/installation.html). +Running Singularity on an HPC system will also depend on how exactly that HPC system has been set up, +but it will generally involve requesting a job, and running the `singularity` command as part of that job, +with the desired resources. + +One of the key questions when using a private registry such as GitLab (see above), +is how to log in to that registry. +Interactively, Singularity provides a --docker-login flag when pulling containers. +In addition, it's possible to use SSH keys for authentication in certain circumstances. + +[^4]: + It has its own container system? + And it's more suited to scientific applications? + Why are these blog posts about Docker then -- why not just go straight to this clearly more convenient tool? + + Two reasons: + Firstly, Docker is way more popular than Singularity, or indeed any other similar tool. + This means more documentation, + more starter images to base our own changes on, + more people to find and fix bugs in the software, + and more support in third-party tools like GitLab. + Secondly, Singularity only runs on Linux, + and the installation process involves cloning the latest source code, + installing a compiler for the Go programming language, + and compiling the whole project ourselves. + + Given that Singularity can run Docker images, + we can use Docker in the knowledge that we can also get the advantages of Singularity later on. + +# Docker in The Wild + +So far, we've generally assumed that the Docker containers being created are wrapping up whole programs for use on the command line. +However, there are also situations where you might want to send a whole environment to other people, +so that they have access to a variety of useful tools. + +If you've used GitLab CI (and some other similar systems), this is how it works. +When GitLab runs a job in a pipeline, it creates a fresh Docker container for that job. +That way, the environment is (mostly) freshly-created for each job, which means that individual jobs are isolated. +It also means that the environment can be anything that the user wants or needs. + +By default, this will probably be some sort of simple Linux environment, +like a recent Ubuntu release, or something similar. +However, if a CI job needs specific tools, it may well be simpler to find a Docker image that already has those tools installed, +than to go through the process of reinstalling those tools every time the job runs. +For example, for a CI job that builds a LaTeX document, it may be easiest to use a pre-built installation such as +[`aergus/latex`](https://hub.docker.com/r/aergus/latex). + +In fact, in GitLab, it's even possible to use the registries from other projects to access custom images, +and use those custom images in jobs in other projects. +It's even possible to use jobs to create images to use in other jobs, if that's something that you really need. + +# Conclusion + +Here, as they say, endeth the lesson. + +Over the course of these three blog posts, +we've talked about the purpose of Docker, +and how it can be used to package applications and their dependencies up in convenient way; +we've got started with Docker, and learned how to run Docker containers on our system; +we've walked through how to create our own Docker containers using a Dockerfile; +and finally, in this post, +we've talked about some of the ways that we can use Docker practically for scientific software development. + +Docker can often be a hammer when all you need is a screwdriver -- +very forceful, and it'll probably get the job done, +but sometimes a more precise tool is ideal. +The motivating example for this blog series was the complexity of Python project development, +where trying to remember which packages are installed, and which packages are needed by a particular project, +can cause a lot of issues when sharing that project with others. +For this case alone, Docker can be useful, but you may want to consider a package manager such as [Poetry](https://python-poetry.org/), +which can manage dependencies and virtual Python environments in a much simpler way. + +However, when different tools, languages, and package management needs come together, +using Docker can often be a good way to make sure that the system really is well-defined, +for example by ensuring that the right system packages are always installed, +as well as the right Python packages, +or the right R or Julia software. + +If you feel like dependency management for your project is becoming too complex, +and you're not sure what packages need to exist, or how to build it on any computer other than your own, +then hopefully this approach of building a Docker container step-by-step can help. +However, if you would like more support for your project, +HIFIS offers a consulting service, which is free-of-charge, and available for any Helmholtz-affiliated groups and projects. +Consultants like myself can come and discuss the issues that you are facing, +and explore ways of solving them in the way that is most appropriate to your team. + +For more details about this, see the "Get In Touch" box below. + +<!-- doing spacing with html is fun... --> +<br /> + +<div class="alert alert-success"> + <h2 id="contact-us"><i class="fas fa-info-circle"></i> Get In Touch</h2> + <p> + HIFIS offers free-of-charge workshops and consulting to research groups within the Helmholtz umbrella. + You can read more about what we offer on our + <strong><a href="{% link services/index.md %}">services page</a></strong>. + If you work for a Helmholtz-affilliated institution, and think that something like this would be useful to you, send us an e-mail at + <strong><a href="mailto:{{site.contact_mail}}">{{site.contact_mail}}</a></strong>, + or fill in our + <strong><a href="{% link services/consulting.html %}#consultation-request-form">consultation request form</a></strong>. + </p> +</div> + +# Footnotes diff --git a/_posts/2020/10/2020-10-15-survey-technology.md b/_posts/2020/10/2020-10-15-survey-technology.md new file mode 100644 index 000000000..8cdda38c3 --- /dev/null +++ b/_posts/2020/10/2020-10-15-survey-technology.md @@ -0,0 +1,205 @@ +--- +title: "HIFIS Survey 2020: A Technology Perspective" +date: 2020-11-27 +authors: + - huste + - hueser +layout: blogpost +title_image: default +categories: + - report +tags: + - survey + - technology +excerpt: > + The HIFIS Software survey gathered information from Helmholtz + research groups about their development practice. This post shows some + insights from a technology perspective and tries to make some conclusions + for the future direction of HIFIS Software technology services. +--- + +Beginning of 2020 the HIFIS Software team initiated a software survey +targeting employees of the whole Helmholtz Association in which 467 participants +could be considered for the analysis. +The figure below depicts how strongly the different Helmholtz research fields +are represented in this survey. + +{:.treat-as-figure} + + +With the results of the survey we want to understand, how we as HIFIS Software +Services can best support your every day life as a research software developer. +In this blog post we will examine the results from a technology perspective +and will on the one hand give an overview of the status quo of the software +engineering process of the participants, and on the other hand try to identify +specific measures. + +## Version Control + +One of the basic requirements for developing sustainable and high-quality +research software is the usage of a version control system (VCS). +On the market there exist multiple competitors, distributed version control +systems like Git or Mercurial and centralized version control systems like +SVN. +In accordance with the trends shown in analysis done by Stackoverflow, we +expected Git to be the most popular tool within Helmholtz. + +{:.treat-as-figure} + +Trend of Stackoverflow questions per month. Created via [Stackoverflow Trends](https://insights.stackoverflow.com/trends) +on 2020-10-15. + +The participants of the survey have answered to the multiple-choice question +about which VCSs they use as shown in the figure below. + +{:.treat-as-figure} + + +A similar diagram as above has already been evaluated in a related +[blog post on results from the survey analysis]({% post_url 2020/11/2020-11-07-survey-results-language-vcs %}). +Here, based on these descriptions we only would like to draw conclusions +from a technological point of view. +Only roughly 10% of the participants claim that they do not use VCSs +while developing their research software. +These results indicate that the awareness is high among the participants +that the usage of version control systems is an important aspect in +sustainable software development. + +In order to unravel that a bit more, we identified a trend in the figure below +that the use of VCSs increase the wider research software developers share +their source code in terms of categories like within their research group, +research organization, research field or even general public. +Hence, there might be a relationship between the broadness of code +share and usage of VCSs. +If this trend holds true then it illustrates that version control +systems are indeed mandatory tools to collaborate with other +developers. + +{:.treat-as-figure} + + +The responses to the survey are then grouped into the six Helmholtz research +fields: + +* Aeronautics, Space and Transport +* Energy +* Earth and Environment +* Health +* Matter +* Key Technologies + +{:.treat-as-figure} + + +In the research field _Aeronautics, Space and Transport_ SVN seems to be +more widely spread compared to other research fields but also the portion +of developers who do not use version control is lowest among the +participants of this research field. +On the one hand, given the collected data about the amount of VCSs questions +asked on Stackoverflow over time introduced earlier this most probably gives an +indication that there is a significant amount of comparably older repositories +that use SVN and that this research field might have a longer tradition of +using VCSs. +On the other hand, this shows that the use of VCSs in this research +field today is more prevalent compared to other Helmholtz research fields. + +From the data it is also possible to compare the usage of version control +systems with the team size participants usually develop software in. +The result is shown in the figure below: + +{:.treat-as-figure} + + +It is clearly visible that the amount of participants who claim to not use any +kind of version control decreases with increasing team size. +This insight is actually very valuable. +This illustration suggests a relationship between team size and the use of VCSs. +One reason for increasing use of VCSs with growing team size might be that VCSs +make collaboration more comfortable and that researchers are aware of this fact. +Whether the use of VCSs has actually already become a de-facto standard in +research software will be further investigated (e.g. in our next survey). + +On the other hand from the participants who claim to develop software mostly +on their own 20% specify to not use version control at all. +This is something we as HIFIS Software Services would like to see change in +the future. +For us, it is important to make people aware that using version control is a +mandatory requirement for software development projects of any scale. +This requires us to make the entry hurdle to using version control systems as +low as possible. +This means that every software developer in Helmholtz must have +access to a suitable and easy-to-use infrastructure to enable this basic +requirement. +Therefore, HIFIS Software Services will offer a GitLab instance that is +usable by every employee of the Helmholtz Association free of charge. + +## Software Development Platforms + +Using version control systems can be considered the entry-point to a world of +platforms that build even more around this basic requirement. +Even if you can typically use a version control system completely local +as well, it really starts paying off when combining version control with online +platforms like e.g. GitLab, GitHub or Bitbucket. +On the one hand this opens up your project for collaboration but also gives +you access to a whole ecosystem of other extremely useful tools like issue +tracking, merge requests, CI/CD or code reviews. +This is why we were also eager to know which software development platforms +the participants use in their every-day life. + +{:.treat-as-figure} + + +The results show that among the participants the most widely used platforms +are GitHub.com and self-hosted GitLab instances followed by GitLab.com. +Thus, about 54% of the participants claim to use GitHub.com, 49% use self-hosted +GitLab instances and about 25% of the participants specify to use GitLab.com. +About 13% claim to not use any of the platforms. +This value is in a similar range to the participants who specified to not use +version control systems. + +## Continuous Integration + +Continuous Integration (CI) is referred to as the practice of merging code +changes into a shared mainline several times a day. +A typical workflow would incorporate the automatic building of a software, +the automatic execution of unit tests and finally, the automatic deployment of +artifacts, e.g the documentation or compiled binaries. +The last step is also referred to as Continuous Deployment (CD). +On the market, there exist multiple tools that support this kind of software +development process. +Some of the tools available at the time of this survey were GitLab CI, Jenkins, +Travis or CircleCI. + +The results of the survey show a pretty diverse situation for the usage of CI +services by the participants. + +{:.treat-as-figure} + + +On the one hand, a portion of 53% of the participants claim to not use CI +services at all. +Among the participants who declared to use CI services, the most commonly used +technologies were GitLab CI (29%), Jenkins (16%) and Travis CI (13%). +Due to the fact that many Helmholtz centers host their own GitLab instances +which also allows to use GitLab CI, we expected GitLab CI to be the most +popular tool among the participants of the survey. +Jenkins is also a tool that can be self-hosted and thus, is also popular and +available in different centers. +Due to the popularity of GitHub, especially for Open Source projects, +it is not surprising that also Travis CI is widely chosen according +to the survey responses. +At the time of creating the survey, GitHub Actions was not yet widely available +on the market. +This explains, why this service does not show up in the list of chosen tools. + +We as HIFIS Software Services would like to see a rise in the overall usage +of CI/CD in the daily software development process. +It offers the chance to automate repeating tasks and introduces automated +quality checks for code changes before they get merged into the mainline. +Therefore, we want to ensure that every Helmholtz researcher regardless of +their affiliation has seamless access to general purpose resources for CI/CD. +This is why the provided GitLab instance will be equipped with scalable +resources for CI/CD. +With this offer, in combination with proper education, training and +consultation we hope to see a rise of the general usage of automation +technologies in research software engineering. -- GitLab