In the beginning of 2020 the HIFIS team conducted a survey among Helmholtz
scientists with the goals of learning more about the current practices
concerning research software development and identifying future challenges.
This blog post will present a glimpse into the survey's results and our take
on the gathered data.
Specifically, we will take a look at the distribution of programming languages
across the different research fields as well as the utilization of
_Version Control Systems_ (VCS) in the same context.
Last, a short insight into the prevalence of various
_Continuous Integration_ (CI) systems will be given to round out this blog
post.
## Programming Languages
We asked the survey participants which programming languages they regularly
used for writing research software.
The following heatmap displays the relative usage of the most predominant programming languages for each research field
{:.treat-as-figure}

All presented numbers are the relative usage of a given language in a given
field.
They might not always add up to exactly 1.00 per field or per language due to
multiple factors:
* Some participants did not answer both questions.
These answers are not represented in the plot.
* Languages that had not at least a _5%_ share in at least one field were
omitted to focus on the most prominent ones and make the graphic easier to
read.
### What can We Learn?
The first thing that catches the eye is that Python seems to be very dominant
in every research field.
We have to take this appearance with a slight grain of salt since the survey did
not distinguish between the outdated, but generally popular, Python 2 and
the current Python 3.
The popularity of the language amongst researchers is not very surprising:
They are well suited for quickly creating small scale scripts, combined with
an extensive choice of libraries for many use cases.
Consequently, our education and training efforts will continue to provide
offers regarding programming in Python and create appropriate courses and
materials to further the knowledge and best practices in this language amongst
scientists and research software developers.
Regarding consultations we expect the team to receive requests regarding the
porting of older Python 2 applications to Python 3, as well as support
requests for dealing with the variance of virtual environments and package
management for this language.
A second language often selected was C++ which often is a popular choice in
high performance computing and larger applications.
This indicates a potential demand for supporting this language in the future as
well, especially in the context of training as well as consulting.
Notable further mentions would be the the strong presence of the statistics
language R in the _Health_ and _Earth and Environment_ research fields,
which implies the opportunity for education and consulting being tailored and
advertised more towards these areas.
## Version Control systems
Similarly to the question above, a second question was analyzed, concerning the
usage of _Version Control Systems_ (VCS) amongst the participants related to
specific fields of research.
{:.treat-as-figure}

The strong prevalence of Git is apparent at first glance.
As a runner-up there are still some projects out there based on SVN for
version control, which - together with a few mentions of CVS - might be an
indicator for older, longer living projects.
The amount of projects not using any version control at all is comparatively
low, which points toward the usage of VCS being an established step in setting
up projects across all research fields.
From an education perspective it appears to be the right way to continue to
focus on basic and advanced Git-courses and promote version control as one of
the standard practices in every scientists toolbox.
It can be expected that the consulting team might face requests for help with
migrating projects from SVN or CVS to Git in the future.
## Continuous Integration
As a third question we wanted to know which _Continuous Integration_ (CI)
services the participants use to automate tasks surrounding their projects.
This, again, was a multiple choice question and the following plot shows the
relative distribution of the given answers:
{:.treat-as-figure}

One very prominent outcome is that over half of the participants did claim to
not use any CI at all.
Several possible reasons for this finding come to mind:
* The question was not clear enough and participants who actually use CI were
not aware of that fact.
* Participants are not aware that CI exists.
* Participants do not see any potential benefit of CI for their projects.
* Participants do not know how to set up and use CI.
Given that practically any project can benefit from employing
_Continuous Integration_ services by automating at least the mundane management
tasks like license checking, documentation generation, style checks, etc. all
four given reasons can be assumed to be a lack in awareness and education.
Further, the plot reveals that the currently used CI solutions are (in
descending order of percentage) _GitLab CI_ which holds over a quarter of all
shares, _Jenkins_ and _Travis CI_ with all other services being barely
represented.
Building on the insights from this analysis, three actions clearly stand out to
improve CI usage across all projects:
* The education team will have to increase their portfolio and offer more
courses centered around CI usage.
* The popularity of _GitLab CI_ will likely increase the demand to migrate
other projects to this system. It will fall to the consulting branch to be
prepared to deal with such requests.
* The technology team has already begun to offer pre-made recipes for CI
pipelines and has an incentive to grow the collection of ready-to-use solutions
for popular scenarios.
Further insights on the usage of Continuous Integration platforms can be