Skip to content Skip to sidebar Skip to footer

Randomly Mask/set Nan X% Of Data Points In Huge Xarray.dataarray

I have a huge (~ 2 billion data points) xarray.DataArray. I would like to randomly delete (either mask or replace by np.nan) a given percentage of the data, where the probability f

Solution 1:

The suggestion by user545424 is an excellent start. In order to not run into memory issues, you can put it in a small user-defined function and map it on the DataArray using the method apply_ufunc.

import xarray as xr
import numpy as np

testdata = xr.DataArray(np.empty((100,1000,1000)), dims=['x','y','z'])

def set_random_fraction_to_nan(data):
    data[np.random.rand(*data.shape) < .8]=np.nan
    returndata

# Set 80% of data randomly to nan
testdata = xr.apply_ufunc(set_random_fraction_to_nan, testdata, input_core_dims=[['x','y','z']],output_core_dims=[['x','y','z']], dask='parallelized')

For some more explanation on wrapping custom functions to work with xarray, see here.

Post a Comment for "Randomly Mask/set Nan X% Of Data Points In Huge Xarray.dataarray"