Unified way to store metadata
As we add more metadata to the Dataset, it's apparent that we could make things cleaner by unifying the way we store metadata for rows and columns. This should all be done under the hood and not effect how we use the Dataset. Here are examples of metadata that we store:
Row metadata:
- split (train, val, test)
- whether or not to include in data matrix
Column metadata:
- feature or label
- original (raw) or preprocessed
- semantic data type
- whether or not to include in data matrix
- missing value indicators
It might make sense to create separate metadata dataframes for rows and columns.