Randomly Mask/set Nan X% Of Data Points In Huge Xarray.dataarray
I have a huge (~ 2 billion data points) xarray.DataArray. I would like to randomly delete (either mask or replace by np.nan) a given percentage of the data, where the probability f
Solution 1:
The suggestion by user545424 is an excellent start. In order to not run into memory issues, you can put it in a small user-defined function and map it on the DataArray using the method apply_ufunc
.
import xarray as xr
import numpy as np
testdata = xr.DataArray(np.empty((100,1000,1000)), dims=['x','y','z'])
def set_random_fraction_to_nan(data):
data[np.random.rand(*data.shape) < .8]=np.nan
returndata
# Set 80% of data randomly to nan
testdata = xr.apply_ufunc(set_random_fraction_to_nan, testdata, input_core_dims=[['x','y','z']],output_core_dims=[['x','y','z']], dask='parallelized')
For some more explanation on wrapping custom functions to work with xarray, see here.
Post a Comment for "Randomly Mask/set Nan X% Of Data Points In Huge Xarray.dataarray"