Collecting Attributes From Dask Dataframe Providers
Solution 1:
There are a few potential questions here:
- Q: How do I load data from many files in a custom format into a single dask dataframe
A: You might check out the
dask.delayed
to load data anddask.dataframe.from_delayed
to convert several daskDelayed
objects into a single dask dataframe. Or, as you're probably doing now, you can usedask.dataframe.from_pandas
anddask.dataframe.concat
. See this example notebook on using dask.delayed from custom objects/functions.Q: How do I store arbitrary metadata onto a dask.dataframe?
A: This is not supported. Generally I recommend using a different data structure to store your metadata if possible. If there are a number of use cases for this then we should consider adding it to dask dataframe. If this is the case then please raise an issue. Generally thought it'd be good to see better support for this in Pandas before dask.dataframe considers supporting it.
Q: I use multi-indexes heavily in Pandas, how can I integrate this workflow into dask.dataframe?
- A: Unfortunately dask.dataframe does not currently support multi-indexes. These would clearly be helpful.
Post a Comment for "Collecting Attributes From Dask Dataframe Providers"