An interview question: About Probability

Question

An interview question:

Given a function f(x) that 1/4 times returns 0, 3/4 times returns 1. Write a function g(x) using f(x) that 1/2 times returns 0, 1/2 times returns 1.

My implementation is:

function g(x) = {
    if (f(x) == 0){ // 1/4 
        var s = f(x) 
        if( s == 1) {// 3/4 * 1/4
            return s  //   3/16
        } else {
            g(x)
        } 
    } else { // 3/4
            var k = f(x)
            if( k == 0) {// 1/4 * 3/4
                return k // 3/16 
            }  else {
                g(x)
            }       
    }
}

Am I right? What's your solution?(you can use any language)

What's the argument x used for? It doesn't seem have any use. — Joachim Sauer, Feb 25, 2011 at 8:55

Jim Lewis · Accepted Answer · 2011-02-19 17:33:35Z

61

If you call f(x) twice in a row, the following outcomes are possible (assuming that successive calls to f(x) are independent, identically distributed trials):

00 (probability 1/4 * 1/4)
01 (probability 1/4 * 3/4)  
10 (probability 3/4 * 1/4)  
11 (probability 3/4 * 3/4)

01 and 10 occur with equal probability. So iterate until you get one of those cases, then return 0 or 1 appropriately:

do
  a=f(x); b=f(x);
while (a == b);

return a;

It might be tempting to call f(x) only once per iteration and keep track of the two most recent values, but that won't work. Suppose the very first roll is 1, with probability 3/4. You'd loop until the first 0, then return 1 (with probability 3/4).

edited Feb 19, 2011 at 17:33

answered Feb 19, 2011 at 17:01

Jim Lewis

44k7 gold badges83 silver badges97 bronze badges

1

Mmmh, interesting usage of Bayes theorem where somehow the loop introduces the normalization…
– Ludovico Fischer
Dec 9, 2013 at 22:44
thanks, this is the same answer as in "make a fair coin from a biased coin" classic question stackoverflow.com/questions/5429045/…
– alex
Jun 19, 2021 at 1:39

Add a comment |

Snowbear · Accepted Answer · 2011-02-19 17:51:36Z

8

The problem with your algorithm is that it repeats itself with high probability. My code:

function g(x) = {
    var s = f(x) + f(x) + f(x); 
    // s = 0, probability:  1/64
    // s = 1, probability:  9/64
    // s = 2, probability: 27/64
    // s = 3, probability: 27/64
    if (s == 2) return 0;
    if (s == 3) return 1;

    return g(x); // probability to go into recursion = 10/64, with only 1 additional f(x) calculation
}

I've measured average number of times f(x) was calculated for your algorithm and for mine. For yours f(x) was calculated around 5.3 times per one g(x) calculation. With my algorithm this number reduced to around 3.5. The same is true for other answers so far since they are actually the same algorithm as you said.

P.S.: your definition doesn't mention 'random' at the moment, but probably it is assumed. See my other answer.

edited Feb 19, 2011 at 17:51

answered Feb 19, 2011 at 17:27

Snowbear

17.1k3 gold badges44 silver badges67 bronze badges

I upvoted this, then cancelled, because I thought I saw a mistake, but can't upvote again now. :-( Either way, +1! When you edit, I believe I can vote again. Perhaps explain the answer a bit more? :)
– Steven Jeuris
Feb 19, 2011 at 17:46
1

@Steven, I've seen you playing with my reputation :)
– Snowbear
Feb 19, 2011 at 17:52
you could handle another 6 of the 10 unhandled cases by differentiating say 0, 0, 1 from 1, 0, 0....
– Tony Delroy
Feb 21, 2011 at 7:48

Add a comment |

btilly · Accepted Answer · 2011-02-19 18:51:01Z

Your solution is correct, if somewhat inefficient and with more duplicated logic. Here is a Python implementation of the same algorithm in a cleaner form.

def g ():
    while True:
        a = f()
        if a != f():
            return a

If f() is expensive you'd want to get more sophisticated with using the match/mismatch information to try to return with fewer calls to it. Here is the most efficient possible solution.

def g ():
    lower = 0.0
    upper = 1.0
    while True:
        if 0.5 < lower:
            return 1
        elif upper < 0.5:
            return 0
        else:
            middle = 0.25 * lower + 0.75 * upper
            if 0 == f():
                lower = middle
            else:
                upper = middle

This takes about 2.6 calls to g() on average.

The way that it works is this. We're trying to pick a random number from 0 to 1, but we happen to stop as soon as we know whether the number is 0 or 1. We start knowing that the number is in the interval (0, 1). 3/4 of the numbers are in the bottom 3/4 of the interval, and 1/4 are in the top 1/4 of the interval. We decide which based on a call to f(x). This means that we are now in a smaller interval.

If we wash, rinse, and repeat enough times we can determine our finite number as precisely as possible, and will have an absolutely equal probability of winding up in any region of the original interval. In particular we have an even probability of winding up bigger than or less than 0.5.

If you wanted you could repeat the idea to generate an endless stream of bits one by one. This is, in fact, provably the most efficient way of generating such a stream, and is the source of the idea of entropy in information theory.

Oops, you're right. Fixed. The way that it works is that if we didn't stop once we were in an interval, we're on our pack to uniformly picking a number anywhere from 0 to 1. In actuality we stop as soon as we know which side of 0.5 the number will be on. I'll try to add an explanation. — btilly, Feb 19, 2011 at 18:40
The entropy of f() is lg(4)/4+lg(4/3)*3/4 ≈ 0.81. Thus it seems like about 1.23 calls to f() should be sufficient in expectation? — Thomas Ahle, Oct 14, 2016 at 10:44

bdk · Accepted Answer · 2011-02-19 17:39:22Z

3

Given a function f(x) that 1/4 times returns 0, 3/4 times returns 1

Taking this statement literally, f(x) if called four times will always return zero once and 1 3 times. This is different than saying f(x) is a probabalistic function and the 0 to 1 ratio will approach 1 to 3 (1/4 vs 3/4) over many iterations. If the first interpretation is valid, than the only valid function for f(x) that will meet the criteria regardless of where in the sequence you start from is the sequence 0111 repeating. (or 1011 or 1101 or 1110 which are the same sequence from a different starting point). Given that constraint,

  g()= (f() == f())

should suffice.

answered Feb 19, 2011 at 17:39

bdk

4,78929 silver badges33 bronze badges

I've seen variations of that question so many times that I already knew the answer, without even having to work out the probabilities. So, in the context of an interview question, I think "independent, identically distributed trials" is the correct assumption to make. Of course, some interviewers are just plain mean, so it's good to know another "trick question" variant to prepare for.
– Jim Lewis
Feb 19, 2011 at 18:42

Add a comment |

Snowbear · Accepted Answer · 2011-02-19 17:48:38Z

3

As already mentioned your definition is not that good regarding probability. Usually it means that not only probability is good but distribution also. Otherwise you can simply write g(x) which will return 1,0,1,0,1,0,1,0 - it will return them 50/50, but numbers won't be random.

Another cheating approach might be:

var invert = false;
function g(x) {
    invert = !invert;
    if (invert) return 1-f(x);
    return f(x);
}

This solution will be better than all others since it calls f(x) only one time. But the results will not be very random.

answered Feb 19, 2011 at 17:48

Snowbear

17.1k3 gold badges44 silver badges67 bronze badges

I dont consider that to be cheating - you're giving the interviewer precisely what they asked for - having said that, your function might end up returning (0) inverted to (1), (1) => (1), (1) inverted to (0) , (1) => (1), 3 1's and 1 0. Why not just compute f() once (to say you've used it) and then just flip the result on each call to g().
– Jimmy
Feb 19, 2011 at 19:02

Add a comment |

Tony Delroy · Accepted Answer · 2012-08-17 10:30:04Z

A refinement of the same approach used in btilly's answer, achieving an average ~1.85 calls to f() per g() result (further refinement documented below achieves ~1.75, tbilly's ~2.6, Jim Lewis's accepted answer ~5.33). Code appears lower in the answer.

Basically, I generate random integers in the range 0 to 3 with even probability: the caller can then test bit 0 for the first 50/50 value, and bit 1 for a second. Reason: the f() probabilities of 1/4 and 3/4 map onto quarters much more cleanly than halves.

Description of algorithm

btilly explained the algorithm, but I'll do so in my own way too...

The algorithm basically generates a random real number x between 0 and 1, then returns a result depending on which "result bucket" that number falls in:

result bucket      result
         x < 0.25     0
 0.25 <= x < 0.5      1
 0.5  <= x < 0.75     2
 0.75 <= x            3

But, generating a random real number given only f() is difficult. We have to start with the knowledge that our x value should be in the range 0..1 - which we'll call our initial "possible x" space. We then hone in on an actual value for x:

each time we call f():
- if f() returns 0 (probability 1 in 4), we consider x to be in the lower quarter of the "possible x" space, and eliminate the upper three quarters from that space
- if f() returns 1 (probability 3 in 4), we consider x to be in the upper three-quarters of the "possible x" space, and eliminate the lower quarter from that space
- when the "possible x" space is completely contained by a single result bucket, that means we've narrowed x down to the point where we know which result value it should map to and have no need to get a more specific value for x.

It may or may not help to consider this diagram :-):

    "result bucket" cut-offs 0,.25,.5,.75,1

    0=========0.25=========0.5==========0.75=========1 "possible x" 0..1
    |           |           .             .          | f() chooses x < vs >= 0.25
    |  result 0 |------0.4375-------------+----------| "possible x" .25..1
    |           | result 1| .             .          | f() chooses x < vs >= 0.4375
    |           |         | .  ~0.58      .          | "possible x" .4375..1
    |           |         | .    |        .          | f() chooses < vs >= ~.58
    |           |         ||.    |    |   .          | 4 distinct "possible x" ranges

Code

int g() // return 0, 1, 2, or 3                                                 
{                                                                               
    if (f() == 0) return 0;                                                     
    if (f() == 0) return 1;                                                     
    double low = 0.25 + 0.25 * (1.0 - 0.25);                                    
    double high = 1.0;                                                          

    while (true)                                                                
    {                                                                           
        double cutoff = low + 0.25 * (high - low);                              
        if (f() == 0)                                                           
            high = cutoff;                                                      
        else                                                                    
            low = cutoff;                                                       

        if (high < 0.50) return 1;                                              
        if (low >= 0.75) return 3;                                              
        if (low >= 0.50 && high < 0.75) return 2;                               
    }                                                                           
}

If helpful, an intermediary to feed out 50/50 results one at a time:

int h()
{
    static int i;
    if (!i)
    {
        int x = g();
        i = x | 4;
        return x & 1;
    }
    else
    {
        int x = i & 2;
        i = 0;
        return x ? 1 : 0;
    }
}

NOTE: This can be further tweaked by having the algorithm switch from considering an f()==0 result to hone in on the lower quarter, to having it hone in on the upper quarter instead, based on which on average resolves to a result bucket more quickly. Superficially, this seemed useful on the third call to f() when an upper-quarter result would indicate an immediate result of 3, while a lower-quarter result still spans probability point 0.5 and hence results 1 and 2. When I tried it, the results were actually worse. A more complex tuning was needed to see actual benefits, and I ended up writing a brute-force comparison of lower vs upper cutoff for second through eleventh calls to g(). The best result I found was an average of ~1.75, resulting from the 1st, 2nd, 5th and 8th calls to g() seeking low (i.e. setting low = cutoff).

kolistivra · Accepted Answer · 2011-03-26 13:00:57Z

Here is a solution based on central limit theorem, originally due to a friend of mine:

/*
Given a function f(x) that 1/4 times returns 0, 3/4 times returns 1. Write a function g(x) using f(x) that 1/2 times returns 0, 1/2 times returns 1.
*/
#include <iostream>
#include <cstdlib>
#include <ctime>
#include <cstdio>
using namespace std;

int f() {
  if (rand() % 4 == 0) return 0;
  return 1;
}

int main() {
  srand(time(0));
  int cc = 0;
  for (int k = 0; k < 1000; k++) { //number of different runs
    int c = 0;
    int limit = 10000; //the bigger the limit, the more we will approach %50 percent
    for (int i=0; i<limit; ++i) c+= f();
    cc += c < limit*0.75 ? 0 : 1; // c will be 0, with probability %50
  }
  printf("%d\n",cc); //cc is gonna be around 500
  return 0;
}

loneboat · Accepted Answer · 2011-02-25 17:29:07Z

0

Since each return of f() represents a 3/4 chance of TRUE, with some algebra we can just properly balance the odds. What we want is another function x() which returns a balancing probability of TRUE, so that

function g() {    
    return f() && x();
}

returns true 50% of the time.

So let's find the probability of x (p(x)), given p(f) and our desired total probability (1/2):

p(f) * p(x) =  1/2
3/4  * p(x) =  1/2
       p(x) = (1/2) / 3/4
       p(x) =  2/3

So x() should return TRUE with a probability of 2/3, since 2/3 * 3/4 = 6/12 = 1/2;

Thus the following should work for g():

function g() {
    return f() && (rand() < 2/3);
}

edited Feb 25, 2011 at 17:29

answered Feb 25, 2011 at 17:05

loneboat

2,8955 gold badges30 silver badges40 bronze badges

I think it is implied that you can only use f(x) as random generator not any other rand().
– Eelvex
Feb 25, 2011 at 17:46

Add a comment |

Rich · Accepted Answer · 2011-02-25 20:49:31Z

Assuming

P(f[x] == 0) = 1/4
P(f[x] == 1) = 3/4

and requiring a function g[x] with the following assumptions

P(g[x] == 0) = 1/2
P(g[x] == 1) = 1/2

I believe the following definition of g[x] is sufficient (Mathematica)

g[x_] := If[f[x] + f[x + 1] == 1, 1, 0]

or, alternatively in C

int g(int x)
{
    return f(x) + f(x+1) == 1
           ? 1
           : 0;
}

This is based on the idea that invocations of {f[x], f[x+1]} would produce the following outcomes

{
  {0, 0},
  {0, 1},
  {1, 0},
  {1, 1}
}

Summing each of the outcomes we have

{
  0,
  1,
  1,
  2
}

where a sum of 1 represents 1/2 of the possible sum outcomes, with any other sum making up the other 1/2.

Edit. As bdk says - {0,0} is less likely than {1,1} because

1/4 * 1/4 < 3/4 * 3/4

However, I am confused myself because given the following definition for f[x] (Mathematica)

f[x_] := Mod[x, 4] > 0 /. {False -> 0, True -> 1}

or alternatively in C

int f(int x)
{
    return (x % 4) > 0
           ? 1
           : 0;
}

then the results obtained from executing f[x] and g[x] seem to have the expected distribution.

Table[f[x], {x, 0, 20}]
{0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0}

Table[g[x], {x, 0, 20}]
{1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1}

I don't think this works. It assumes the four possible values for f(x)+f(x) are equally probable. In reality, {0,0} is much less probably than {1,1} — bdk, Feb 25, 2011 at 20:00
It does seem you've confused yourself :-). The functions aren't really f(x) and g(x)... they don't have inputs and are simply f() and g(). Thus, there's no f(x+1). And as for f() + f()... there's 1/16 chance of 0, 6/16 of 1, and 9/16 of 2. Your g() function "switches" on the test for 1, so will have a 6/16 vs 10/16 chance for the two results (which need to be equally likely). — Tony Delroy, Feb 26, 2011 at 19:08

dbasnett · Accepted Answer · 2011-02-26 15:14:07Z

This is much like the Monty Hall paradox.

In general.

Public Class Form1

    'the general case
    '
    'twiceThis = 2 is 1 in four chance of 0
    'twiceThis = 3 is 1 in six chance of 0
    '
    'twiceThis = x is 1 in 2x chance of 0

    Const twiceThis As Integer = 7
    Const numOf As Integer = twiceThis * 2

    Private Sub Button1_Click(ByVal sender As System.Object, _
                              ByVal e As System.EventArgs) Handles Button1.Click

        Const tries As Integer = 1000
        y = New List(Of Integer)

        Dim ct0 As Integer = 0
        Dim ct1 As Integer = 0
        Debug.WriteLine("")
        ''show all possible values of fx
        'For x As Integer = 1 To numOf
        '    Debug.WriteLine(fx)
        'Next

        'test that gx returns 50% 0's and 50% 1's
        Dim stpw As New Stopwatch
        stpw.Start()
        For x As Integer = 1 To tries
            Dim g_x As Integer = gx()
            'Debug.WriteLine(g_x.ToString) 'used to verify that gx returns 0 or 1 randomly
            If g_x = 0 Then ct0 += 1 Else ct1 += 1
        Next
        stpw.Stop()
        'the results
        Debug.WriteLine((ct0 / tries).ToString("p1"))
        Debug.WriteLine((ct1 / tries).ToString("p1"))
        Debug.WriteLine((stpw.ElapsedTicks / tries).ToString("n0"))

    End Sub

    Dim prng As New Random
    Dim y As New List(Of Integer)

    Private Function fx() As Integer

        '1 in numOf chance of zero being returned
        If y.Count = 0 Then
            'reload y
            y.Add(0) 'fx has only one zero value
            Do
                y.Add(1) 'the rest are ones
            Loop While y.Count < numOf
        End If
        'return a random value 
        Dim idx As Integer = prng.Next(y.Count)
        Dim rv As Integer = y(idx)
        y.RemoveAt(idx) 'remove the value selected
        Return rv

    End Function

    Private Function gx() As Integer

        'a function g(x) using f(x) that 50% of the time returns 0
        '                           that 50% of the time returns 1
        Dim rv As Integer = 0
        For x As Integer = 1 To twiceThis
            fx()
        Next
        For x As Integer = 1 To twiceThis
            rv += fx()
        Next
        If rv = twiceThis Then Return 1 Else Return 0

    End Function
End Class

Collectives™ on Stack Overflow

An interview question: About Probability

10 Answers 10

Description of algorithm

Code

Your Answer

Not the answer you're looking for? Browse other questions tagged
algorithm
random
probability
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

Description of algorithm

Code

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged algorithmrandomprobability or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
algorithm
random
probability
or ask your own question.