ColumnNames - organise how we deal with column names in dataframe
Description
To be able to process time series data we need to know what columns hold what information. We are using pandera for data validation and offering options to convert raw data into our "standard" format. Whilst this still makes sense, we also discussed how we can make it more adaptable. We could make an ENUM object which stores the column names in a single place, which can then be referred to throughout the code. This keeps things organised. It will also make it possible to write ways to customise column names in the future for enhanced usability.
Expected Behavior
An enum object which stores the column names. This is then used to refer to data columns throughout the code base.
Current Behavior
Hard coded column names used in data.
Proposed Solution
As suggested by Fredo:
from enum import Enum, auto
class ColumnInfo:
class Name(Enum):
DATE_TIME = auto()
EPI_NEUTRON_COUNT = auto()
PRESSURE = auto()
RELATIVE_HUMIDITY = auto()
NEUTRON_COUNT = auto()
def __str__(self):
return ColumnInfo._representation[self]
_representation: dict["ColumnInfo.Name", str] = {
Name.DATE_TIME: "date_time",
Name.EPI_NEUTRON_COUNT: "epithermal_neutrons",
}
@classmethod
def relabel(cls, column_name: Name, new_label: str):
cls._representation[column_name] = new_label
Acceptance Criteria
-
A ColumnName enum object which we set to our standard values -
A method to update the ColumnName enum