Merge branch '137-tech-analysis-virt' into 'master'

Resolve "Analysis | Software | Technology: Section with plots regarding Q018 - Usage of Virtualisation Tools" Closes #137 See merge request !122

Merge branch '137-tech-analysis-virt' into 'master'
23062298 · Huste, Tobias · 218117b6 · 542d568b · 23062298 · 23062298
Commit 23062298 authored 3 years ago by Huste, Tobias
--- a/report/sec/hifis-software-technology.tex
+++ b/report/sec/hifis-software-technology.tex
@@ -298,3 +298,89 @@ services.
 Not least, we are investing in compute resources and hardware that is able
 to execute these Continuous Integration tasks and offer them to all
 \textit{Helmholtz} at no cost and without any administrative barriers.
+
+\paragraph{Virtualisation Tools} \label{para:technology-virt}
+
+As soon as software is developed and expected to run within different
+environments, stretching from local computers, local virtual machines,
+local containers to remote servers, development teams start thinking about
+how to guarantee consistent behaviour.
+The usual way is to provide a single point of truth, realised through a single
+environment which can be set up on all computers and servers using
+virtualisation tools.
+Yet, there is another good reason to use these tools which is reproducibility -
+a topic that is particularly important for research software.
+These environments are configured via declarative text files, called recipes,
+that can be put under version control like any other code and can be shared
+amongst all contributors.
+This paradigm is called \textit{Infrastructure as Code}.
+These environments can be created in a reproducible manner by each contributor
+independently without the need to come up with an own environment.
+
+Some of the more prominent tools are \textit{Vagrant}, \textit{Docker},
+\textit{Singularity}, and \textit{Podman}.
+We asked the participants whether they used the mentioned virtualisation tools
+in the last twelve months (see figure \ref{fig:q018-virt-per-options}).
+The application of these techniques and tools just recently started to emerge
+within the field of software engineering.
+Not so long ago these tasks were attributed solely to the operations team and
+not the development team, but nowadays these borders become more and more
+blurry.
+Development teams and even separate DevOps teams (a Portmanteau of
+"Developments" and "Operations") started to dedicate themselves to this topic
+during the last years.
+Because of its novelty our assumption was that these tools and techniques are
+not as well-known as other technologies yet.
+This was backed up by the responses of the participants.
+Except for \textit{Docker}, which is used by 30 \% of the respondees, those
+mentioned virtualisation tools are largely unknown and if they are known they
+seem to be rather irrelevant to the group of responders.
+
+\begin{figure}
+	\centering
+	\includegraphics[width=.96\textheight,angle=90]{fig/Q018-virt-per-options}
+	\caption{Usage Virtualisation tools}
+	\label{fig:q018-virt-per-options}
+\end{figure}
+
+The overall picture is almost the same.
+Only 31 \% used virtualisation tools, while 69 \% did not
+(see figure \ref{fig:q018-virt-overall}).
+
+\begin{figure}
+	\centering
+	\includegraphics[width=.75\textwidth]{fig/Q018-virt-overall}
+	\caption{Overall usage of Virtualisation tools}
+	\label{fig:q018-virt-overall}
+\end{figure}
+
+Again, the distinction between developers and non-developers is important in
+order to get a more informative impression on how the usage of these
+technologies is distributed.
+While 12 \% of the non-developers used virtualisation tools, which is not
+a surprising outcome, only 44 \% of the developers said that they
+made use of these tools (see figure \ref{fig:q018-virt-swdevs}).
+
+\begin{figure}
+	\centering
+	\includegraphics[width=.75\textwidth]{fig/Q018-virt-swdevs}
+	\caption{Usage of Virtualisation tools by developers and non-developers}
+	\label{fig:q018-virt-swdevs}
+\end{figure}
+
+Of course we are also interested in the distribution of usage, when we
+distinguish between individual and team developers
+(see figure \ref{fig:q018-virt-team}).
+These tools seem to be more important for developers that are part of a team.
+35 \% of the individual developers and 59 \% of the team developers are using
+virtualisation tools.
+This reflects pretty well the necessity within development teams to provide
+environments in which software can be run consistently and in reproducible
+ways by all team members whilst offering more degrees of freedom.
+
+\begin{figure}
+	\centering
+	\includegraphics[width=.75\textwidth]{fig/Q018-virt-team}
+	\caption{Usage of Virtualisation tools by individual and team developers}
+	\label{fig:q018-virt-team}
+\end{figure}
--- a/scripts/technology_usage_virt.py
+++ b/scripts/technology_usage_virt.py
+# hifis-surveyval
+# Framework to help developing analysis scripts for the HIFIS Software survey.
+#
+# SPDX-FileCopyrightText: 2021 HIFIS Software <support@hifis.net>
+#
+# SPDX-License-Identifier: GPL-3.0-or-later
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+"""Analyse technology questions about Virtualisation tools."""
+
+from pathlib import Path
+from typing import Dict, List
+
+from hifis_surveyval.core import util
+from pandas import DataFrame
+
+from hifis_surveyval.data_container import DataContainer
+from hifis_surveyval.hifis_surveyval import HIFISSurveyval
+from hifis_surveyval.models.question import Question
+from hifis_surveyval.models.question_collection import QuestionCollection
+
+
+def run(hifis_surveyval: HIFISSurveyval, data: DataContainer):
+    """Analyses technology question about Virtualisation tools."""
+    print("======== Beginning of Script: " + Path(__file__).stem + " ========")
+
+    # Get QuestionCollections and Questions analysed in this script.
+    virt: QuestionCollection = data.collection_for_id("Q018")
+    developers: Question = data.question_for_id("V001/_")
+    team: Question = data.question_for_id("V003/_")
+
+    # Get data as DataFrames and Series for these Collections and Questions
+    data_virt_raw: DataFrame = virt.as_data_frame()
+    data_developers_raw: DataFrame = developers.as_series()
+    data_team_raw: DataFrame = team.as_series()
+
+    # Define a mapping to recode boolean to string values.
+    bool_recode_dict: Dict[str, str] = {True: "Yes", False: "No"}
+
+    # Define an ordering of the AnswerOptions.
+    options_order_list: List[str] = ["Yes",
+                                     "Don't know it",
+                                     "Not relevant",
+                                     "Don't know how",
+                                     "Doesn't fit my needs",
+                                     "Not available"]
+
+    # Define an ordering of the recodes AnswerOptions.
+    options_yesno_order_list: List[str] = ["Yes", "No"]
+
+    # Define a mapping to replace the full ID by the label.
+    questions_mapping_dict: Dict[str, str] = {question.full_id: question.label
+                                              for question in virt.questions}
+
+    title: str = virt.text("en-GB")
+    print(title)
+    questions: List[str] = [question.text("en-GB")
+                            for question in virt.questions]
+    print(questions)
+
+    title_aggregated: str = "Did you use containerisation tools or services " \
+                            "in the last twelve months?"
+
+    # Create a DataFrame in which only data of participants is contained
+    # who answered the question and does not contain only None values.
+    data_virt_answered_only: DataFrame = data_virt_raw.dropna(axis=0, how="all")
+    # Rename columns in DataFrame from the full ID to the label.
+    data_virt_renamed: DataFrame = data_virt_answered_only.rename(
+        mapper=questions_mapping_dict, axis='columns')
+
+    # Define a Series of boolean values which determine which rows have all
+    # "No" in each cell.
+    virt_all_no: DataFrame = \
+        (data_virt_renamed["Vagrant"] != "Yes") & \
+        (data_virt_renamed["Docker"] != "Yes") & \
+        (data_virt_renamed["Singularity"] != "Yes") & \
+        (data_virt_renamed["Podman"] != "Yes")
+
+    # Invert it to a Series of boolean values which determine which cells have
+    # not all "No" in each cell.
+    virt_not_all_no = ~virt_all_no
+
+    ###########################################################################
+    # Frequencies of answers given regarding usage of Virtualisation tools
+    ###########################################################################
+
+    data_virt_freq_abs: DataFrame = util.dataframe_value_counts(
+        data_virt_renamed, relative_values=False)
+
+    data_virt_freq_abs = data_virt_freq_abs.reindex(options_order_list)
+
+    # Divide all absolute values by the number of answers given to get relative
+    # values of multiple choice question.
+    data_virt_freq_rel: DataFrame = \
+        data_virt_freq_abs.div(len(data_virt_answered_only))
+
+    hifis_surveyval.plotter.plot_bar_chart(
+        data_frame=data_virt_freq_rel,
+        plot_file_name="Q018-virt-per-options",
+        plot_title=title,
+        plot_title_fontsize=16,
+        x_axis_label="Answer options",
+        y_axis_label="Usage Virtualisation tools (relative counts)",
+        x_label_rotation=0,
+        round_value_labels_to_decimals=2,
+        figure_size=(13, 5))
+
+    ###########################################################################
+    # Frequencies regarding overall usage of Virtualisation tools
+    ###########################################################################
+
+    data_virt_all_yesno_freq_rel: DataFrame = util.dataframe_value_counts(
+        DataFrame(virt_not_all_no), relative_values=True)
+    data_virt_all_yesno_freq_rel = \
+        data_virt_all_yesno_freq_rel.rename(bool_recode_dict, axis=0)
+    data_virt_all_yesno_freq_rel = \
+        data_virt_all_yesno_freq_rel.rename(
+            {0: "Virtualisation tools"}, axis=1)
+    data_virt_all_yesno_freq_rel = \
+        data_virt_all_yesno_freq_rel.T[options_yesno_order_list]
+
+    hifis_surveyval.plotter.plot_bar_chart(
+        data_frame=data_virt_all_yesno_freq_rel,
+        plot_file_name="Q018-virt-overall",
+        plot_title=title_aggregated,
+        plot_title_fontsize=16,
+        x_axis_label="",
+        y_axis_label="Usage Virtualisation tools (relative counts)",
+        x_label_rotation=0,
+        round_value_labels_to_decimals=2,
+        figure_size=(6, 5))
+
+    ###########################################################################
+    # Frequencies usage of Virtualisation tools based on answers by developers
+    ###########################################################################
+
+    data_virt_per_developers: DataFrame = \
+        util.filter_and_group_series(virt_not_all_no,
+                                     data_developers_raw.dropna())
+
+    data_virt_all_yesno_per_developers_freq_rel: DataFrame = \
+        util.dataframe_value_counts(data_virt_per_developers,
+                                    relative_values=True)
+    data_virt_all_yesno_per_developers_freq_rel = \
+        data_virt_all_yesno_per_developers_freq_rel.rename(
+            {"Yes": "Software-Developers", "No": " Non-Software-Developers"},
+            axis=1)
+    data_virt_all_yesno_per_developers_freq_rel = \
+        data_virt_all_yesno_per_developers_freq_rel.rename(
+            bool_recode_dict, axis=0)
+    data_virt_all_yesno_per_developers_freq_rel = \
+        data_virt_all_yesno_per_developers_freq_rel.T[options_yesno_order_list]
+
+    hifis_surveyval.plotter.plot_bar_chart(
+        data_frame=data_virt_all_yesno_per_developers_freq_rel,
+        plot_file_name="Q018-virt-swdevs",
+        plot_title=title_aggregated,
+        plot_title_fontsize=16,
+        x_axis_label="",
+        y_axis_label="Usage Virtualisation tools (relative counts)",
+        x_label_rotation=0,
+        round_value_labels_to_decimals=2,
+        figure_size=(6, 5))
+
+    ###########################################################################
+    # Frequencies usage of Virtualisation tools depending on team size
+    ###########################################################################
+
+    data_virt_plus_developers = \
+        DataFrame(virt_not_all_no).join(DataFrame(data_developers_raw),
+                                        on="id",
+                                        how="inner")
+    data_virt_only_developers = \
+        data_virt_plus_developers[data_virt_plus_developers["V001/_"] == "Yes"]
+
+    data_virt_per_team_size: DataFrame = \
+        util.filter_and_group_series(data_virt_only_developers[0],
+                                     data_team_raw.dropna())
+
+    data_virt_all_yesno_per_team_size_freq_rel: DataFrame = \
+        util.dataframe_value_counts(data_virt_per_team_size,
+                                    relative_values=True)
+    data_virt_all_yesno_per_team_size_freq_rel = \
+        data_virt_all_yesno_per_team_size_freq_rel.rename(
+            {"Yes": "Team-Developer", "No": "Individual-Developer"},
+            axis=1)
+    data_virt_all_yesno_per_team_size_freq_rel = \
+        data_virt_all_yesno_per_team_size_freq_rel.rename(
+            bool_recode_dict, axis=0)
+    data_virt_all_yesno_per_team_size_freq_rel = \
+        data_virt_all_yesno_per_team_size_freq_rel[["Individual-Developer",
+                                                   "Team-Developer"]]
+    data_virt_all_yesno_per_team_size_freq_rel = \
+        data_virt_all_yesno_per_team_size_freq_rel.T[options_yesno_order_list]
+
+    hifis_surveyval.plotter.plot_bar_chart(
+        data_frame=data_virt_all_yesno_per_team_size_freq_rel,
+        plot_file_name="Q018-virt-team",
+        plot_title=title_aggregated,
+        plot_title_fontsize=16,
+        x_axis_label="Team Size",
+        y_axis_label="Usage Virtualisation tools (relative counts)",
+        x_label_rotation=0,
+        round_value_labels_to_decimals=2,
+        figure_size=(6, 5))
+
+    ###########################################################################
+
+    print("=========== End of Script: " + Path(__file__).stem + " ===========")