7

I have a pandas dataframe and want to delete 90% of data which satisfies condition.

The condition is very simple. If the value of the column "Parameter1" is greater than a threshold, then delete it.

My question is how to delete 90% of them, not 90% values in a row, but random

1 Answer 1

14

Use boolean indexing with sample:

df = pd.DataFrame({
    'A': [5] * 20 + [1] * 2,
    'B': list(range(22))
})

df = df.drop(df[df['A'] > 4].sample(frac=.9).index)
print (df)
    A   B
11  5  11
15  5  15
20  1  20
21  1  21
1
  • 1
    Thank you! That works perfect!
    – cuga
    Dec 7, 2018 at 8:16

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.