Skip to content Skip to sidebar Skip to footer

2d Convolution In Python With Missing Data

I know there is scipy.signal.convolve2d function to handle 2 dimension convolution for 2d numpy array, and there is numpy.ma module to handle missing data, but these two methods do

Solution 1:

I dont think replacing with 0s is the correct way of doing this, you are nudging the covolved values toward 0. These missings should literally be treated as "missings". Because they represent the missing pieces of information and there is no justification to just assume they could be 0s, and they shouldn't be involved in any calcuation at all.

I tried setting missing values to numpy.nan and then convolve, the result suggest that any overlap between the kernel and any missing gives an nan in the result, even if the overlap is with an 0 in the kernel, so you get an enlarged hole of missings in the result. Depending on your application, this could be the desired result.

But in some cases, you don't want to discard so much information just for 1 single missing (maybe <= 50% of missing is still tolerable). In such cases, I've found another module astropy that has a better implementation: numpy.nans are ignored (or replaced with interpolated values?).

So using astropy, you would do the following:

from astropy.convolution import convolve
inarray=numpy.where(inarray.mask,numpy.nan,inarray) # masking still doesn't work, has to set to numpy.nan
result=convolve(inarray,kernel)

But still, you don't have control over how much of missing is tolerable. To achieve that, I've created a function that uses the scipy.ndimage.convolve() for the initial convolution, but manually re-compute values whenever missings (numpy.nan) are involved:

defconvolve2d(slab,kernel,max_missing=0.5,verbose=True):
    '''2D convolution with missings ignored

    <slab>: 2d array. Input array to convolve. Can have numpy.nan or masked values.
    <kernel>: 2d array, convolution kernel, must have sizes as odd numbers.
    <max_missing>: float in (0,1), max percentage of missing in each convolution
                   window is tolerated before a missing is placed in the result.

    Return <result>: 2d array, convolution result. Missings are represented as
                     numpy.nans if they are in <slab>, or masked if they are masked
                     in <slab>.

    '''from scipy.ndimage import convolve as sciconvolve

    assert numpy.ndim(slab)==2, "<slab> needs to be 2D."assert numpy.ndim(kernel)==2, "<kernel> needs to be 2D."assert kernel.shape[0]%2==1and kernel.shape[1]%2==1, "<kernel> shape needs to be an odd number."assert max_missing > 0and max_missing < 1, "<max_missing> needs to be a float in (0,1)."#--------------Get mask for missings--------------ifnothasattr(slab,'mask') and numpy.any(numpy.isnan(slab))==False:
        has_missing=False
        slab2=slab.copy()

    elifnothasattr(slab,'mask') and numpy.any(numpy.isnan(slab)):
        has_missing=True
        slabmask=numpy.where(numpy.isnan(slab),1,0)
        slab2=slab.copy()
        missing_as='nan'elif (slab.mask.size==1and slab.mask==False) or numpy.any(slab.mask)==False:
        has_missing=False
        slab2=slab.copy()

    elifnot (slab.mask.size==1and slab.mask==False) and numpy.any(slab.mask):
        has_missing=True
        slabmask=numpy.where(slab.mask,1,0)
        slab2=numpy.where(slabmask==1,numpy.nan,slab)
        missing_as='mask'else:
        has_missing=False
        slab2=slab.copy()

    #--------------------No missing--------------------ifnot has_missing:
        result=sciconvolve(slab2,kernel,mode='constant',cval=0.)
    else:
        H,W=slab.shape
        hh=int((kernel.shape[0]-1)/2)  # half height
        hw=int((kernel.shape[1]-1)/2)  # half width
        min_valid=(1-max_missing)*kernel.shape[0]*kernel.shape[1]

        # dont forget to flip the kernel
        kernel_flip=kernel[::-1,::-1]

        result=sciconvolve(slab2,kernel,mode='constant',cval=0.)
        slab2=numpy.where(slabmask==1,0,slab2)

        #------------------Get nan holes------------------
        miss_idx=zip(*numpy.where(slabmask==1))

        if missing_as=='mask':
            mask=numpy.zeros([H,W])

        for yii,xii in miss_idx:

            #-------Recompute at each new nan in result-------
            hole_ys=range(max(0,yii-hh),min(H,yii+hh+1))
            hole_xs=range(max(0,xii-hw),min(W,xii+hw+1))

            for hi in hole_ys:
                for hj in hole_xs:
                    hi1=max(0,hi-hh)
                    hi2=min(H,hi+hh+1)
                    hj1=max(0,hj-hw)
                    hj2=min(W,hj+hw+1)

                    slab_window=slab2[hi1:hi2,hj1:hj2]
                    mask_window=slabmask[hi1:hi2,hj1:hj2]
                    kernel_ij=kernel_flip[max(0,hh-hi):min(hh*2+1,hh+H-hi), 
                                     max(0,hw-hj):min(hw*2+1,hw+W-hj)]
                    kernel_ij=numpy.where(mask_window==1,0,kernel_ij)

                    #----Fill with missing if not enough valid data----
                    ksum=numpy.sum(kernel_ij)
                    if ksum<min_valid:
                        if missing_as=='nan':
                            result[hi,hj]=numpy.nan
                        elif missing_as=='mask':
                            result[hi,hj]=0.
                            mask[hi,hj]=Trueelse:
                        result[hi,hj]=numpy.sum(slab_window*kernel_ij)

        if missing_as=='mask':
            result=numpy.ma.array(result)
            result.mask=mask

    return result

Below is a figure demonstrating the output. On the left is a 30x30 random map with 3 numpy.nan holes with sizes of:

  1. 1x1
  2. 3x3
  3. 5x5

enter image description here

On the right is the convolved output, by a 5x5 kernel (with all 1s), and a tolerance level of 50% (max_missing=0.5).

So the first 2 smaller holes are filled using nearby values, and in the last one, because the number of missings > 0.5x5x5 = 12.5, numpy.nans are placed to represent missing information.

Solution 2:

I found a hack. instead of nan use imaginary number (where it is nan change it to 1i) run the convolution and set that wherever the imaginary value is above a threshold it is nan. whenever it is bellow just take the real value. here is a code snippet:

frames_complex = np.zeros_like(frames_, dtype=np.complex64)
frames_complex[np.isnan(frames_)] = np.array((1j))
frames_complex[np.bitwise_not(np.isnan(frames_))] =                         
frames_[np.bitwise_not(np.isnan(frames_))]
convolution = signal.convolve(frames_complex, gaussian_window, 'valid')
convolution[np.imag(convolution)>0.2] = np.nan
convolution = convolution.astype(np.float32)

Post a Comment for "2d Convolution In Python With Missing Data"