Skip to content
Snippets Groups Projects
Commit 23062298 authored by Huste, Tobias's avatar Huste, Tobias :rabbit:
Browse files

Merge branch '137-tech-analysis-virt' into 'master'

Resolve "Analysis | Software | Technology: Section with plots regarding Q018 - Usage of Virtualisation Tools"

Closes #137

See merge request !122
parents 218117b6 542d568b
No related branches found
No related tags found
1 merge request!122Resolve "Analysis | Software | Technology: Section with plots regarding Q018 - Usage of Virtualisation Tools"
Pipeline #112705 passed
......@@ -298,3 +298,89 @@ services.
Not least, we are investing in compute resources and hardware that is able
to execute these Continuous Integration tasks and offer them to all
\textit{Helmholtz} at no cost and without any administrative barriers.
\paragraph{Virtualisation Tools} \label{para:technology-virt}
As soon as software is developed and expected to run within different
environments, stretching from local computers, local virtual machines,
local containers to remote servers, development teams start thinking about
how to guarantee consistent behaviour.
The usual way is to provide a single point of truth, realised through a single
environment which can be set up on all computers and servers using
virtualisation tools.
Yet, there is another good reason to use these tools which is reproducibility -
a topic that is particularly important for research software.
These environments are configured via declarative text files, called recipes,
that can be put under version control like any other code and can be shared
amongst all contributors.
This paradigm is called \textit{Infrastructure as Code}.
These environments can be created in a reproducible manner by each contributor
independently without the need to come up with an own environment.
Some of the more prominent tools are \textit{Vagrant}, \textit{Docker},
\textit{Singularity}, and \textit{Podman}.
We asked the participants whether they used the mentioned virtualisation tools
in the last twelve months (see figure \ref{fig:q018-virt-per-options}).
The application of these techniques and tools just recently started to emerge
within the field of software engineering.
Not so long ago these tasks were attributed solely to the operations team and
not the development team, but nowadays these borders become more and more
blurry.
Development teams and even separate DevOps teams (a Portmanteau of
"Developments" and "Operations") started to dedicate themselves to this topic
during the last years.
Because of its novelty our assumption was that these tools and techniques are
not as well-known as other technologies yet.
This was backed up by the responses of the participants.
Except for \textit{Docker}, which is used by 30 \% of the respondees, those
mentioned virtualisation tools are largely unknown and if they are known they
seem to be rather irrelevant to the group of responders.
\begin{figure}
\centering
\includegraphics[width=.96\textheight,angle=90]{fig/Q018-virt-per-options}
\caption{Usage Virtualisation tools}
\label{fig:q018-virt-per-options}
\end{figure}
The overall picture is almost the same.
Only 31 \% used virtualisation tools, while 69 \% did not
(see figure \ref{fig:q018-virt-overall}).
\begin{figure}
\centering
\includegraphics[width=.75\textwidth]{fig/Q018-virt-overall}
\caption{Overall usage of Virtualisation tools}
\label{fig:q018-virt-overall}
\end{figure}
Again, the distinction between developers and non-developers is important in
order to get a more informative impression on how the usage of these
technologies is distributed.
While 12 \% of the non-developers used virtualisation tools, which is not
a surprising outcome, only 44 \% of the developers said that they
made use of these tools (see figure \ref{fig:q018-virt-swdevs}).
\begin{figure}
\centering
\includegraphics[width=.75\textwidth]{fig/Q018-virt-swdevs}
\caption{Usage of Virtualisation tools by developers and non-developers}
\label{fig:q018-virt-swdevs}
\end{figure}
Of course we are also interested in the distribution of usage, when we
distinguish between individual and team developers
(see figure \ref{fig:q018-virt-team}).
These tools seem to be more important for developers that are part of a team.
35 \% of the individual developers and 59 \% of the team developers are using
virtualisation tools.
This reflects pretty well the necessity within development teams to provide
environments in which software can be run consistently and in reproducible
ways by all team members whilst offering more degrees of freedom.
\begin{figure}
\centering
\includegraphics[width=.75\textwidth]{fig/Q018-virt-team}
\caption{Usage of Virtualisation tools by individual and team developers}
\label{fig:q018-virt-team}
\end{figure}
# hifis-surveyval
# Framework to help developing analysis scripts for the HIFIS Software survey.
#
# SPDX-FileCopyrightText: 2021 HIFIS Software <support@hifis.net>
#
# SPDX-License-Identifier: GPL-3.0-or-later
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
"""Analyse technology questions about Virtualisation tools."""
from pathlib import Path
from typing import Dict, List
from hifis_surveyval.core import util
from pandas import DataFrame
from hifis_surveyval.data_container import DataContainer
from hifis_surveyval.hifis_surveyval import HIFISSurveyval
from hifis_surveyval.models.question import Question
from hifis_surveyval.models.question_collection import QuestionCollection
def run(hifis_surveyval: HIFISSurveyval, data: DataContainer):
"""Analyses technology question about Virtualisation tools."""
print("======== Beginning of Script: " + Path(__file__).stem + " ========")
# Get QuestionCollections and Questions analysed in this script.
virt: QuestionCollection = data.collection_for_id("Q018")
developers: Question = data.question_for_id("V001/_")
team: Question = data.question_for_id("V003/_")
# Get data as DataFrames and Series for these Collections and Questions
data_virt_raw: DataFrame = virt.as_data_frame()
data_developers_raw: DataFrame = developers.as_series()
data_team_raw: DataFrame = team.as_series()
# Define a mapping to recode boolean to string values.
bool_recode_dict: Dict[str, str] = {True: "Yes", False: "No"}
# Define an ordering of the AnswerOptions.
options_order_list: List[str] = ["Yes",
"Don't know it",
"Not relevant",
"Don't know how",
"Doesn't fit my needs",
"Not available"]
# Define an ordering of the recodes AnswerOptions.
options_yesno_order_list: List[str] = ["Yes", "No"]
# Define a mapping to replace the full ID by the label.
questions_mapping_dict: Dict[str, str] = {question.full_id: question.label
for question in virt.questions}
title: str = virt.text("en-GB")
print(title)
questions: List[str] = [question.text("en-GB")
for question in virt.questions]
print(questions)
title_aggregated: str = "Did you use containerisation tools or services " \
"in the last twelve months?"
# Create a DataFrame in which only data of participants is contained
# who answered the question and does not contain only None values.
data_virt_answered_only: DataFrame = data_virt_raw.dropna(axis=0, how="all")
# Rename columns in DataFrame from the full ID to the label.
data_virt_renamed: DataFrame = data_virt_answered_only.rename(
mapper=questions_mapping_dict, axis='columns')
# Define a Series of boolean values which determine which rows have all
# "No" in each cell.
virt_all_no: DataFrame = \
(data_virt_renamed["Vagrant"] != "Yes") & \
(data_virt_renamed["Docker"] != "Yes") & \
(data_virt_renamed["Singularity"] != "Yes") & \
(data_virt_renamed["Podman"] != "Yes")
# Invert it to a Series of boolean values which determine which cells have
# not all "No" in each cell.
virt_not_all_no = ~virt_all_no
###########################################################################
# Frequencies of answers given regarding usage of Virtualisation tools
###########################################################################
data_virt_freq_abs: DataFrame = util.dataframe_value_counts(
data_virt_renamed, relative_values=False)
data_virt_freq_abs = data_virt_freq_abs.reindex(options_order_list)
# Divide all absolute values by the number of answers given to get relative
# values of multiple choice question.
data_virt_freq_rel: DataFrame = \
data_virt_freq_abs.div(len(data_virt_answered_only))
hifis_surveyval.plotter.plot_bar_chart(
data_frame=data_virt_freq_rel,
plot_file_name="Q018-virt-per-options",
plot_title=title,
plot_title_fontsize=16,
x_axis_label="Answer options",
y_axis_label="Usage Virtualisation tools (relative counts)",
x_label_rotation=0,
round_value_labels_to_decimals=2,
figure_size=(13, 5))
###########################################################################
# Frequencies regarding overall usage of Virtualisation tools
###########################################################################
data_virt_all_yesno_freq_rel: DataFrame = util.dataframe_value_counts(
DataFrame(virt_not_all_no), relative_values=True)
data_virt_all_yesno_freq_rel = \
data_virt_all_yesno_freq_rel.rename(bool_recode_dict, axis=0)
data_virt_all_yesno_freq_rel = \
data_virt_all_yesno_freq_rel.rename(
{0: "Virtualisation tools"}, axis=1)
data_virt_all_yesno_freq_rel = \
data_virt_all_yesno_freq_rel.T[options_yesno_order_list]
hifis_surveyval.plotter.plot_bar_chart(
data_frame=data_virt_all_yesno_freq_rel,
plot_file_name="Q018-virt-overall",
plot_title=title_aggregated,
plot_title_fontsize=16,
x_axis_label="",
y_axis_label="Usage Virtualisation tools (relative counts)",
x_label_rotation=0,
round_value_labels_to_decimals=2,
figure_size=(6, 5))
###########################################################################
# Frequencies usage of Virtualisation tools based on answers by developers
###########################################################################
data_virt_per_developers: DataFrame = \
util.filter_and_group_series(virt_not_all_no,
data_developers_raw.dropna())
data_virt_all_yesno_per_developers_freq_rel: DataFrame = \
util.dataframe_value_counts(data_virt_per_developers,
relative_values=True)
data_virt_all_yesno_per_developers_freq_rel = \
data_virt_all_yesno_per_developers_freq_rel.rename(
{"Yes": "Software-Developers", "No": " Non-Software-Developers"},
axis=1)
data_virt_all_yesno_per_developers_freq_rel = \
data_virt_all_yesno_per_developers_freq_rel.rename(
bool_recode_dict, axis=0)
data_virt_all_yesno_per_developers_freq_rel = \
data_virt_all_yesno_per_developers_freq_rel.T[options_yesno_order_list]
hifis_surveyval.plotter.plot_bar_chart(
data_frame=data_virt_all_yesno_per_developers_freq_rel,
plot_file_name="Q018-virt-swdevs",
plot_title=title_aggregated,
plot_title_fontsize=16,
x_axis_label="",
y_axis_label="Usage Virtualisation tools (relative counts)",
x_label_rotation=0,
round_value_labels_to_decimals=2,
figure_size=(6, 5))
###########################################################################
# Frequencies usage of Virtualisation tools depending on team size
###########################################################################
data_virt_plus_developers = \
DataFrame(virt_not_all_no).join(DataFrame(data_developers_raw),
on="id",
how="inner")
data_virt_only_developers = \
data_virt_plus_developers[data_virt_plus_developers["V001/_"] == "Yes"]
data_virt_per_team_size: DataFrame = \
util.filter_and_group_series(data_virt_only_developers[0],
data_team_raw.dropna())
data_virt_all_yesno_per_team_size_freq_rel: DataFrame = \
util.dataframe_value_counts(data_virt_per_team_size,
relative_values=True)
data_virt_all_yesno_per_team_size_freq_rel = \
data_virt_all_yesno_per_team_size_freq_rel.rename(
{"Yes": "Team-Developer", "No": "Individual-Developer"},
axis=1)
data_virt_all_yesno_per_team_size_freq_rel = \
data_virt_all_yesno_per_team_size_freq_rel.rename(
bool_recode_dict, axis=0)
data_virt_all_yesno_per_team_size_freq_rel = \
data_virt_all_yesno_per_team_size_freq_rel[["Individual-Developer",
"Team-Developer"]]
data_virt_all_yesno_per_team_size_freq_rel = \
data_virt_all_yesno_per_team_size_freq_rel.T[options_yesno_order_list]
hifis_surveyval.plotter.plot_bar_chart(
data_frame=data_virt_all_yesno_per_team_size_freq_rel,
plot_file_name="Q018-virt-team",
plot_title=title_aggregated,
plot_title_fontsize=16,
x_axis_label="Team Size",
y_axis_label="Usage Virtualisation tools (relative counts)",
x_label_rotation=0,
round_value_labels_to_decimals=2,
figure_size=(6, 5))
###########################################################################
print("=========== End of Script: " + Path(__file__).stem + " ===========")
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment