From 2986007ad6dcbd7c8ade51ec7aeaefc4d5cc48ba Mon Sep 17 00:00:00 2001 From: "Uwe Jandt (DESY)" <uwe.jandt@desy.de> Date: Tue, 8 Dec 2020 13:39:37 +0100 Subject: [PATCH] two SCC survey posts: corrected dates, coding background 2 --- .../2020-11-27-survey-results-language-vcs.md | 154 +++++++++++++ .../2020/11/2020-11-27-survey-technology.md | 205 ++++++++++++++++++ 2 files changed, 359 insertions(+) create mode 100644 _posts/2020/11/2020-11-27-survey-results-language-vcs.md create mode 100644 _posts/2020/11/2020-11-27-survey-technology.md diff --git a/_posts/2020/11/2020-11-27-survey-results-language-vcs.md b/_posts/2020/11/2020-11-27-survey-results-language-vcs.md new file mode 100644 index 000000000..2820e725f --- /dev/null +++ b/_posts/2020/11/2020-11-27-survey-results-language-vcs.md @@ -0,0 +1,154 @@ +--- +layout: blogpost +title: "HIFIS Survey 2020: Programming, CI and VCS" +data: 2020-11-27 +authors: + - erxleben +title_image: coding_background.jpg +categories: + - report +--- + +## Introduction +In the beginning of 2020 the HIFIS team conducted a survey among Helmholtz +scientists with the goals of learning more about the current practices +concerning research software development and identifying future challenges. + +This blog post will present a glimpse into the survey's results and our take +on the gathered data. +Specifically, we will take a look at the distribution of programming languages +across the different research fields as well as the utilization of +_Version Control Systems_ (VCS) in the same context. +Last, a short insight into the prevalence of various +_Continuous Integration_ (CI) systems will be given to round out this blog +post. + +## Programming Languages + +We asked the survey participants which programming languages they regularly +used for writing research software. +The following heatmap displays the relative usage of the most predominant programming languages for each research field + +{:.treat-as-figure} + + +All presented numbers are the relative usage of a given language in a given +field. +They might not always add up to exactly 1.00 per field or per language due to +multiple factors: + +* Some participants did not answer both questions. + These answers are not represented in the plot. +* Languages that had not at least a _5%_ share in at least one field were + omitted to focus on the most prominent ones and make the graphic easier to + read. + +### What can We Learn? + +The first thing that catches the eye is that Python seems to be very dominant +in every research field. +We have to take this appearance with a slight grain of salt since the survey did +not distinguish between the outdated, but generally popular, Python 2 and +the current Python 3. +The popularity of the language amongst researchers is not very surprising: +They are well suited for quickly creating small scale scripts, combined with +an extensive choice of libraries for many use cases. + +Consequently, our education and training efforts will continue to provide +offers regarding programming in Python and create appropriate courses and +materials to further the knowledge and best practices in this language amongst +scientists and research software developers. + +Regarding consultations we expect the team to receive requests regarding the +porting of older Python 2 applications to Python 3, as well as support +requests for dealing with the variance of virtual environments and package +management for this language. + +A second language often selected was C++ which often is a popular choice in +high performance computing and larger applications. + +This indicates a potential demand for supporting this language in the future as +well, especially in the context of training as well as consulting. + +Notable further mentions would be the the strong presence of the statistics +language R in the _Health_ and _Earth and Environment_ research fields, +which implies the opportunity for education and consulting being tailored and +advertised more towards these areas. + +## Version Control systems + +Similarly to the question above, a second question was analyzed, concerning the +usage of _Version Control Systems_ (VCS) amongst the participants related to +specific fields of research. + +{:.treat-as-figure} + + +The strong prevalence of Git is apparent at first glance. +As a runner-up there are still some projects out there based on SVN for +version control, which - together with a few mentions of CVS - might be an +indicator for older, longer living projects. +The amount of projects not using any version control at all is comparatively +low, which points toward the usage of VCS being an established step in setting +up projects across all research fields. + +From an education perspective it appears to be the right way to continue to +focus on basic and advanced Git-courses and promote version control as one of +the standard practices in every scientists toolbox. +It can be expected that the consulting team might face requests for help with +migrating projects from SVN or CVS to Git in the future. + +## Continuous Integration + +As a third question we wanted to know which _Continuous Integration_ (CI) +services the participants use to automate tasks surrounding their projects. +This, again, was a multiple choice question and the following plot shows the +relative distribution of the given answers: + +{:.treat-as-figure} + + +One very prominent outcome is that over half of the participants did claim to +not use any CI at all. +Several possible reasons for this finding come to mind: +* The question was not clear enough and participants who actually use CI were + not aware of that fact. +* Participants are not aware that CI exists. +* Participants do not see any potential benefit of CI for their projects. +* Participants do not know how to set up and use CI. + +Given that practically any project can benefit from employing +_Continuous Integration_ services by automating at least the mundane management +tasks like license checking, documentation generation, style checks, etc. all +four given reasons can be assumed to be a lack in awareness and education. + +Further, the plot reveals that the currently used CI solutions are (in +descending order of percentage) _GitLab CI_ which holds over a quarter of all +shares, _Jenkins_ and _Travis CI_ with all other services being barely +represented. + +Building on the insights from this analysis, three actions clearly stand out to +improve CI usage across all projects: +* The education team will have to increase their portfolio and offer more + courses centered around CI usage. +* The popularity of _GitLab CI_ will likely increase the demand to migrate + other projects to this system. It will fall to the consulting branch to be + prepared to deal with such requests. +* The technology team has already begun to offer pre-made recipes for CI + pipelines and has an incentive to grow the collection of ready-to-use solutions + for popular scenarios. + +Further insights on the usage of Continuous Integration platforms can be +gained from another +[blog post]({% post_url 2020/11/2020-11-27-survey-technology %}) +discussing the survey analysis from a technology perspective. + +## Conclusion + +Thanks to the participants of the HIFIS survey in 2020 it was possible to gain +a first glimpse into the status quo of research software engineering within the +Helmholtz centers. With this data, the needs of the scientists could be assessed +from a birds-eye perspective and it is possible to determine concrete steps to +offer better support for the scientists at Helmholtz. + + diff --git a/_posts/2020/11/2020-11-27-survey-technology.md b/_posts/2020/11/2020-11-27-survey-technology.md new file mode 100644 index 000000000..38c6ab374 --- /dev/null +++ b/_posts/2020/11/2020-11-27-survey-technology.md @@ -0,0 +1,205 @@ +--- +title: "HIFIS Survey 2020: A Technology Perspective" +data: 2020-11-27 +authors: + - huste + - hueser +layout: blogpost +title_image: coding_background.jpg +categories: + - report +tags: + - survey + - technology +excerpt: > + The HIFIS Software survey gathered information from Helmholtz + research groups about their development practice. This post shows some + insights from a technology perspective and tries to make some conclusions + for the future direction of HIFIS Software technology services. +--- + +Beginning of 2020 the HIFIS Software team initiated a software survey +targeting employees of the whole Helmholtz Association in which 467 participants +could be considered for the analysis. +The figure below depicts how strongly the different Helmholtz research fields +are represented in this survey. + +{:.treat-as-figure} + + +With the results of the survey we want to understand, how we as HIFIS Software +Services can best support your every day life as a research software developer. +In this blog post we will examine the results from a technology perspective +and will on the one hand give an overview of the status quo of the software +engineering process of the participants, and on the other hand try to identify +specific measures. + +## Version Control + +One of the basic requirements for developing sustainable and high-quality +research software is the usage of a version control system (VCS). +On the market there exist multiple competitors, distributed version control +systems like Git or Mercurial and centralized version control systems like +SVN. +In accordance with the trends shown in analysis done by Stackoverflow, we +expected Git to be the most popular tool within Helmholtz. + +{:.treat-as-figure} + +Trend of Stackoverflow questions per month. Created via [Stackoverflow Trends](https://insights.stackoverflow.com/trends) +on 2020-10-15. + +The participants of the survey have answered to the multiple-choice question +about which VCSs they use as shown in the figure below. + +{:.treat-as-figure} + + +A similar diagram as above has already been evaluated in a related +[blog post on results from the survey analysis]({% post_url 2020/11/2020-11-27-survey-results-language-vcs %}). +Here, based on these descriptions we only would like to draw conclusions +from a technological point of view. +Only roughly 10% of the participants claim that they do not use VCSs +while developing their research software. +These results indicate that the awareness is high among the participants +that the usage of version control systems is an important aspect in +sustainable software development. + +In order to unravel that a bit more, we identified a trend in the figure below +that the use of VCSs increase the wider research software developers share +their source code in terms of categories like within their research group, +research organization, research field or even general public. +Hence, there might be a relationship between the broadness of code +share and usage of VCSs. +If this trend holds true then it illustrates that version control +systems are indeed mandatory tools to collaborate with other +developers. + +{:.treat-as-figure} + + +The responses to the survey are then grouped into the six Helmholtz research +fields: + +* Aeronautics, Space and Transport +* Energy +* Earth and Environment +* Health +* Matter +* Key Technologies + +{:.treat-as-figure} + + +In the research field _Aeronautics, Space and Transport_ SVN seems to be +more widely spread compared to other research fields but also the portion +of developers who do not use version control is lowest among the +participants of this research field. +On the one hand, given the collected data about the amount of VCSs questions +asked on Stackoverflow over time introduced earlier this most probably gives an +indication that there is a significant amount of comparably older repositories +that use SVN and that this research field might have a longer tradition of +using VCSs. +On the other hand, this shows that the use of VCSs in this research +field today is more prevalent compared to other Helmholtz research fields. + +From the data it is also possible to compare the usage of version control +systems with the team size participants usually develop software in. +The result is shown in the figure below: + +{:.treat-as-figure} + + +It is clearly visible that the amount of participants who claim to not use any +kind of version control decreases with increasing team size. +This insight is actually very valuable. +This illustration suggests a relationship between team size and the use of VCSs. +One reason for increasing use of VCSs with growing team size might be that VCSs +make collaboration more comfortable and that researchers are aware of this fact. +Whether the use of VCSs has actually already become a de-facto standard in +research software will be further investigated (e.g. in our next survey). + +On the other hand from the participants who claim to develop software mostly +on their own 20% specify to not use version control at all. +This is something we as HIFIS Software Services would like to see change in +the future. +For us, it is important to make people aware that using version control is a +mandatory requirement for software development projects of any scale. +This requires us to make the entry hurdle to using version control systems as +low as possible. +This means that every software developer in Helmholtz must have +access to a suitable and easy-to-use infrastructure to enable this basic +requirement. +Therefore, HIFIS Software Services will offer a GitLab instance that is +usable by every employee of the Helmholtz Association free of charge. + +## Software Development Platforms + +Using version control systems can be considered the entry-point to a world of +platforms that build even more around this basic requirement. +Even if you can typically use a version control system completely local +as well, it really starts paying off when combining version control with online +platforms like e.g. GitLab, GitHub or Bitbucket. +On the one hand this opens up your project for collaboration but also gives +you access to a whole ecosystem of other extremely useful tools like issue +tracking, merge requests, CI/CD or code reviews. +This is why we were also eager to know which software development platforms +the participants use in their every-day life. + +{:.treat-as-figure} + + +The results show that among the participants the most widely used platforms +are GitHub.com and self-hosted GitLab instances followed by GitLab.com. +Thus, about 54% of the participants claim to use GitHub.com, 49% use self-hosted +GitLab instances and about 25% of the participants specify to use GitLab.com. +About 13% claim to not use any of the platforms. +This value is in a similar range to the participants who specified to not use +version control systems. + +## Continuous Integration + +Continuous Integration (CI) is referred to as the practice of merging code +changes into a shared mainline several times a day. +A typical workflow would incorporate the automatic building of a software, +the automatic execution of unit tests and finally, the automatic deployment of +artifacts, e.g the documentation or compiled binaries. +The last step is also referred to as Continuous Deployment (CD). +On the market, there exist multiple tools that support this kind of software +development process. +Some of the tools available at the time of this survey were GitLab CI, Jenkins, +Travis or CircleCI. + +The results of the survey show a pretty diverse situation for the usage of CI +services by the participants. + +{:.treat-as-figure} + + +On the one hand, a portion of 53% of the participants claim to not use CI +services at all. +Among the participants who declared to use CI services, the most commonly used +technologies were GitLab CI (29%), Jenkins (16%) and Travis CI (13%). +Due to the fact that many Helmholtz centers host their own GitLab instances +which also allows to use GitLab CI, we expected GitLab CI to be the most +popular tool among the participants of the survey. +Jenkins is also a tool that can be self-hosted and thus, is also popular and +available in different centers. +Due to the popularity of GitHub, especially for Open Source projects, +it is not surprising that also Travis CI is widely chosen according +to the survey responses. +At the time of creating the survey, GitHub Actions was not yet widely available +on the market. +This explains, why this service does not show up in the list of chosen tools. + +We as HIFIS Software Services would like to see a rise in the overall usage +of CI/CD in the daily software development process. +It offers the chance to automate repeating tasks and introduces automated +quality checks for code changes before they get merged into the mainline. +Therefore, we want to ensure that every Helmholtz researcher regardless of +their affiliation has seamless access to general purpose resources for CI/CD. +This is why the provided GitLab instance will be equipped with scalable +resources for CI/CD. +With this offer, in combination with proper education, training and +consultation we hope to see a rise of the general usage of automation +technologies in research software engineering. -- GitLab