FormatDataIngest - Quick fix for path issue
Description
Just after approving the merge ( !20 (merged)) I realised a potential issue.
self._data_location = validate_and_convert_file_path(
file_path=data_location,
base=self._path_to_yaml.parent. # <-- here
)
This would always add the YAML parent folder to the path. This would be OK 90% of the time, but what if, for example I have the YAML on my laptop, but my data is in the cloud. I would want to add the cloud path to the data and not have it appended.
I've added a check for whether the path is relative or absolute. If it's an absolute path it just uses that directly, if it's relative, it expects the YAML to be near the actual data
Expected Behavior
relative path awareness
Proposed Solution
######
def validate_and_convert_file_path(
file_path: Union[str, Path],
base: Union[str, Path] = "",
) -> Path:
"""
Used when initialising the object. If a string is given as a
data_location, it is converted to a pathlib.Path object. If a
pathlib.Path object is given this is returned. Other types will
cause an error.
Parameters
----------
data_location : Union[str, Path]
The data_location attribute from initialisation.
Returns
-------
pathlib.Path
The data_location as a pathlib.Path object.
Raises
------
ValueError
Error if string or pathlib.Path given.
"""
if file_path is None:
return None
if isinstance(file_path, str):
new_file_path = Path(file_path)
if new_file_path.is_absolute():
return new_file_path
else:
return base / Path(file_path)
elif isinstance(file_path, Path):
if file_path.is_absolute():
return file_path
else:
return base / file_path
else:
message = (
"data_location must be of type str or pathlib.Path. \n"
f"{type(file_path).__name__} provided, "
"please change this."
)
core_logger.error(message)
raise ValueError(message)
Acceptance Criteria
-
Add awareness to path type