Numpy array loss of dimension when masking

Question

I want to select certain elements of an array and perform a weighted average calculation based on the values. However, using a filter condition, destroys the original structure of the array. arr which was of shape (2, 2, 3, 2) is turned into a 1-dimensional array. This is of no use to me, as not all these elements need to be combined later on with each other (but subarrays of them). How can I avoid this flattening?

>>> arr = np.asarray([ [[[1, 11], [2, 22], [3, 33]], [[4, 44], [5, 55], [6, 66]]], [ [[7, 77], [8, 88], [9, 99]], [[0, 32], [1, 33], [2, 34] ]] ])
>>> arr
array([[[[ 1, 11],
         [ 2, 22],
         [ 3, 33]],

        [[ 4, 44],
         [ 5, 55],
         [ 6, 66]]],


       [[[ 7, 77],
         [ 8, 88],
         [ 9, 99]],

        [[ 0, 32],
         [ 1, 33],
         [ 2, 34]]]])
>>> arr.shape
(2, 2, 3, 2)
>>> arr[arr>3]
array([11, 22, 33,  4, 44,  5, 55,  6, 66,  7, 77,  8, 88,  9, 99, 32, 33,
       34])
>>> arr[arr>3].shape
(18,)

Elaborate on the calculation that you need to do with these values. How would you use the arr structure? — hpaulj, Mar 14, 2015 at 18:05

Alex · Accepted Answer · 2015-03-14 07:22:47Z

Checkout numpy.where

http://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html

To keep the same dimensionality you are going to need a fill value. In the example below I use 0, but you could also use np.nan

np.where(arr>3, arr, 0)

returns

array([[[[ 0, 11],
         [ 0, 22],
         [ 0, 33]],

        [[ 4, 44],
         [ 5, 55],
         [ 6, 66]]],


       [[[ 7, 77],
         [ 8, 88],
         [ 9, 99]],

        [[ 0, 32],
         [ 0, 33],
         [ 0, 34]]]])

ali_m · Accepted Answer · 2015-03-14 19:44:36Z

You might consider using an np.ma.masked_array to represent the subset of elements that satisfy your condition:

import numpy as np

arr = np.asarray([[[[1, 11], [2, 22], [3, 33]],
                   [[4, 44], [5, 55], [6, 66]]],
                  [[[7, 77], [8, 88], [9, 99]],
                   [[0, 32], [1, 33], [2, 34]]]])

masked_arr = np.ma.masked_less(arr, 3)

print(masked_arr)
# [[[[-- 11]
#    [-- 22]
#    [3 33]]

#   [[4 44]
#    [5 55]
#    [6 66]]]


#  [[[7 77]
#    [8 88]
#    [9 99]]

#   [[-- 32]
#    [-- 33]
#    [-- 34]]]]

As you can see, the masked array retains its original dimensions. You can access the underlying data and the mask via the .data and .mask attributes respectively. Most numpy functions will not take into account masked values, e.g.:

# mean of whole array
print(arr.mean())
# 26.75

# mean of non-masked elements only
print(masked_arr.mean())
# 33.4736842105

The result of an element-wise operation on a masked array and a non-masked array will also preserve the values of the mask:

masked_arrsum = masked_arr + np.random.randn(*arr.shape)

print(masked_arrsum)
# [[[[-- 11.359989067421582]
#    [-- 23.249092437269162]
#    [3.326111354088174 32.679132708120726]]

#   [[4.289134334263137 43.38559221094378]
#    [6.028063054523145 53.5043991898567]
#    [7.44695154979811 65.56890530368757]]]


#  [[[8.45692625294376 77.36860675985407]
#    [5.915835159196378 87.28574554110307]
#    [8.251106168209688 98.7621940026713]]

#   [[-- 33.24398289945855]
#    [-- 33.411941757624284]
#    [-- 34.964817895873715]]]]

The sum is only computed over the non-masked values of masked_arr - you can see this by looking at masked_sum.data:

print(masked_sum.data)
# [[[[  1.          11.35998907]
#    [  2.          23.24909244]
#    [  3.32611135  32.67913271]]

#   [[  4.28913433  43.38559221]
#    [  6.02806305  53.50439919]
#    [  7.44695155  65.5689053 ]]]


#  [[[  8.45692625  77.36860676]
#    [  5.91583516  87.28574554]
#    [  8.25110617  98.762194  ]]

#   [[  0.          33.2439829 ]
#    [  1.          33.41194176]
#    [  2.          34.9648179 ]]]]

I was tossing up between your's and np.where. I went with it because it suits the purpose in a single line of code. It seemed like the best fit. All were good answers... — orange, Mar 15, 2015 at 0:28

hpaulj · Accepted Answer · 2015-03-14 07:55:49Z

Look at arr>3:

In [71]: arr>3
Out[71]: 
array([[[[False,  True],
         [False,  True],
         [False,  True]],

        [[ True,  True],
         [ True,  True],
         [ True,  True]]],


       [[[ True,  True],
         [ True,  True],
         [ True,  True]],

        [[False,  True],
         [False,  True],
         [False,  True]]]], dtype=bool)

arr[arr>3] selects those elements where the mask is True. What kind of structure or shape do you want that selection to have? Flat is the only thing that makes sense, doesn't it? arr itself is not changed.

You could zero out the terms that don't fit the mask,

In [84]: arr1=arr.copy()
In [85]: arr1[arr<=3]=0
In [86]: arr1
Out[86]: 
array([[[[ 0, 11],
         [ 0, 22],
         [ 0, 33]],

        [[ 4, 44],
         [ 5, 55],
         [ 6, 66]]],


       [[[ 7, 77],
         [ 8, 88],
         [ 9, 99]],

        [[ 0, 32],
         [ 0, 33],
         [ 0, 34]]]])

Now you could do weight sums or averages over various dimensions.

np.nonzero (or np.where) might also be useful, giving you the indices of the the selected terms:

In [88]: np.nonzero(arr>3)
Out[88]: 
(array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
 array([0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1]),
 array([0, 1, 2, 0, 0, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2, 0, 1, 2]),
 array([1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1]))

kmario23 · Accepted Answer · 2019-05-03 14:36:48Z

If you on the other hand need the minimum value to be replaced in place of the values less than the value that you check for (3 in your example), then you can use numpy.clip() or ndarray.clip():

In [27]: np.clip(arr, 3, np.max(arr))
Out[27]: 
array([[[[ 3, 11],
         [ 3, 22],
         [ 3, 33]],

        [[ 4, 44],
         [ 5, 55],
         [ 6, 66]]],


       [[[ 7, 77],
         [ 8, 88],
         [ 9, 99]],

        [[ 3, 32],
         [ 3, 33],
         [ 3, 34]]]])

jimp · Accepted Answer · 2020-11-09 21:25:25Z

-4

CLEARLY what you need 2 do is first re—shape the array and then convert like so:

maschked_data = data[:,0][np.zeros(np.reshape(data, -1), np.reshape(data, -1).shape[0])[:,0].shape[0]]

data[:,0] <3

answered Nov 9, 2020 at 21:25

jimp

1

Add a comment |

Collectives™ on Stack Overflow

Numpy array loss of dimension when masking

5 Answers 5

Your Answer

Not the answer you're looking for? Browse other questions tagged
python
arrays
numpy
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged pythonarraysnumpy or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
arrays
numpy
or ask your own question.