ColumnNames - organise how we deal with column names in dataframe

Description

To be able to process time series data we need to know what columns hold what information. We are using pandera for data validation and offering options to convert raw data into our "standard" format. Whilst this still makes sense, we also discussed how we can make it more adaptable. We could make an ENUM object which stores the column names in a single place, which can then be referred to throughout the code. This keeps things organised. It will also make it possible to write ways to customise column names in the future for enhanced usability.

Expected Behavior

An enum object which stores the column names. This is then used to refer to data columns throughout the code base.

Current Behavior

Hard coded column names used in data.

Proposed Solution

As suggested by Fredo:

from enum import Enum, auto


class ColumnInfo:
    class Name(Enum):
        DATE_TIME = auto()
        EPI_NEUTRON_COUNT = auto()
        PRESSURE = auto()
        RELATIVE_HUMIDITY = auto()
        NEUTRON_COUNT = auto()

        def __str__(self):
            return ColumnInfo._representation[self]

    _representation: dict["ColumnInfo.Name", str] = {
        Name.DATE_TIME: "date_time",
        Name.EPI_NEUTRON_COUNT: "epithermal_neutrons",
    }

    @classmethod
    def relabel(cls, column_name: Name, new_label: str):
        cls._representation[column_name] = new_label

Acceptance Criteria

A ColumnName enum object which we set to our standard values
A method to update the ColumnName enum

Edited Jul 04, 2024 by Daniel Power