Clarified body and title, formatting fixes and format improvements

Source Link

edit approved Mar 10, 2020 at 10:52

Aryan

4.6k
4
23
45

Find Generate an integer that is not among four billion given ones

It is anI have been given this interview question:

My analysis:

We can do external sorting, thus we get toletting us know the range of the integers.

My question is what is the best way to detect the missing integer in the sorted big integer sets?

My understanding(after reading all answers):

My understanding (after reading all the answers):

Assuming we are talking about 32-bit integers. There, there are 2^322³² = 4*10⁹ distinct integers.

Case 1: we have 1 GB = 1 * 10⁹ * 8 bits = 8 billion bits memory.

Solution:

Solution: ifIf we use one bit representing one distinct integer, it is enough. we don't need need sort. Implementation:

Implementation:

Case 2: 10 MB memory = 10 * 10⁶ * 8 bits = 80 million bits

For all possible 16-bit prefixes, there are 2¹⁶ number of integers = 65536, we need 2¹⁶ * 4 * 8 = 2 million bits. We need build 65536 buckets. For each bucket, we need 4 bytes holding all possibilities because the worst case is all the 4 billion integers belong to the same bucket.

A clarification for those arriving late: The question, as asked, does not say that there is exactly one integer that is not contained in the file -- at least that's not how most people interpret it. Many comments in the comment thread are about that variation of the task, though. Unfortunately the comment that introduced it to the comment thread was later deleted by its author, so now it looks like the orphaned replies to it just misunderstood everything. It's very confusing. Sorry.

^{A clarification for those arriving late: The question, as asked, does not say that there is exactly one integer that is not contained in the file—at least that's not how most people interpret it. Many comments in the comment thread **are** about that variation of the task, though. Unfortunately the comment that **introduced** it to the comment thread was later deleted by its author, so now it looks like the orphaned replies to it just misunderstood everything. It's very confusing, sorry.}

Find an integer not among four billion given ones

It is an interview question:

My analysis:

We can do external sorting, thus we get to know the range of the integers. My question is what is the best way to detect the missing integer in the sorted big integer sets?

My understanding(after reading all answers):

Assuming we are talking about 32-bit integers. There are 2^32 = 4*10⁹ distinct integers.

Case 1: we have 1 GB = 1 * 10⁹ * 8 bits = 8 billion bits memory.

Solution: if we use one bit representing one distinct integer, it is enough. we don't need sort. Implementation:

Case 2: 10 MB memory = 10 * 10⁶ * 8 bits = 80 million bits

A clarification for those arriving late: The question, as asked, does not say that there is exactly one integer that is not contained in the file -- at least that's not how most people interpret it. Many comments in the comment thread are about that variation of the task, though. Unfortunately the comment that introduced it to the comment thread was later deleted by its author, so now it looks like the orphaned replies to it just misunderstood everything. It's very confusing. Sorry.

Generate an integer that is not among four billion given ones

I have been given this interview question:

My analysis:

We can do external sorting, thus letting us know the range of the integers.

My question is what is the best way to detect the missing integer in the sorted big integer sets?

My understanding (after reading all the answers):

Assuming we are talking about 32-bit integers, there are 2³² = 4*10⁹ distinct integers.

Case 1: we have 1 GB = 1 * 10⁹ * 8 bits = 8 billion bits memory.

Solution:

If we use one bit representing one distinct integer, it is enough. we don't need sort.

Implementation:

Case 2: 10 MB memory = 10 * 10⁶ * 8 bits = 80 million bits

For all possible 16-bit prefixes, there are 2¹⁶ number of integers = 65536, we need 2¹⁶ * 4 * 8 = 2 million bits. We need build 65536 buckets. For each bucket, we need 4 bytes holding all possibilities because the worst case is all the 4 billion integers belong to the same bucket.

^{A clarification for those arriving late: The question, as asked, does not say that there is exactly one integer that is not contained in the file—at least that's not how most people interpret it. Many comments in the comment thread **are** about that variation of the task, though. Unfortunately the comment that **introduced** it to the comment thread was later deleted by its author, so now it looks like the orphaned replies to it just misunderstood everything. It's very confusing, sorry.}

Improved Formatting

Source Link

edited Jun 29, 2019 at 15:30

phuclv

39.6k
15
170
491

It is an interview question:

My analysis:

The size of the file is 4×10⁹×4 bytes = 16 GB.

We can do external sorting, thus we get to know the range of the integers. My question is what is the best way to detect the missing integer in the sorted big integer sets?

My understanding(after reading all answers):

Assuming we are talking about 32-bit integers. There are 2^32 = 4*10⁹ distinct integers.

Case 1: we have 1 GB = 1 * 10⁹ * 8 bits = 8 billion bits memory.

Solution: if we use one bit representing one distinct integer, it is enough. we don't need sort. Implementation:

int radix = 8;
byte[] bitfield = new byte[0xffffffff/radix];
void F() throws FileNotFoundException{
    Scanner in = new Scanner(new FileReader("a.txt"));
    while(in.hasNextInt()){
        int n = in.nextInt();
        bitfield[n/radix] |= (1 << (n%radix));
    }

    for(int i = 0; i< bitfield.lenght; i++){
        for(int j =0; j<radix; j++){
            if( (bitfield[i] & (1<<j)) == 0) System.out.print(i*radix+j);
        }
    }
}

Case 2: 10 MB memory = 10 * 10⁶ * 8 bits = 80 million bits

Conclusion: We decrease memory through increasing file pass.

A clarification for those arriving late: The question, as asked, does not say that there is exactly one integer that is not contained in the file -- at least that's not how most people interpret it. Many comments in the comment thread are about that variation of the task, though. Unfortunately the comment that introduced it to the comment thread was later deleted by its author, so now it looks like the orphaned replies to it just misunderstood everything. It's very confusing. Sorry.

It is an interview question:

My analysis:

The size of the file is 4×10⁹×4 bytes = 16 GB.

We can do external sorting, thus we get to know the range of the integers. My question is what is the best way to detect the missing integer in the sorted big integer sets?

My understanding(after reading all answers):

Assuming we are talking about 32-bit integers. There are 2^32 = 4*10⁹ distinct integers.

Case 1: we have 1 GB = 1 * 10⁹ * 8 bits = 8 billion bits memory.

Solution: if we use one bit representing one distinct integer, it is enough. we don't need sort. Implementation:

int radix = 8;
byte[] bitfield = new byte[0xffffffff/radix];
void F() throws FileNotFoundException{
    Scanner in = new Scanner(new FileReader("a.txt"));
    while(in.hasNextInt()){
        int n = in.nextInt();
        bitfield[n/radix] |= (1 << (n%radix));
    }

    for(int i = 0; i< bitfield.lenght; i++){
        for(int j =0; j<radix; j++){
            if( (bitfield[i] & (1<<j)) == 0) System.out.print(i*radix+j);
        }
    }
}

Case 2: 10 MB memory = 10 * 10⁶ * 8 bits = 80 million bits

Conclusion: We decrease memory through increasing file pass.

A clarification for those arriving late: The question, as asked, does not say that there is exactly one integer that is not contained in the file -- at least that's not how most people interpret it. Many comments in the comment thread are about that variation of the task, though. Unfortunately the comment that introduced it to the comment thread was later deleted by its author, so now it looks like the orphaned replies to it just misunderstood everything. It's very confusing. Sorry.

It is an interview question:

My analysis:

The size of the file is 4×10⁹×4 bytes = 16 GB.

We can do external sorting, thus we get to know the range of the integers. My question is what is the best way to detect the missing integer in the sorted big integer sets?

My understanding(after reading all answers):

Assuming we are talking about 32-bit integers. There are 2^32 = 4*10⁹ distinct integers.

Case 1: we have 1 GB = 1 * 10⁹ * 8 bits = 8 billion bits memory.

Solution: if we use one bit representing one distinct integer, it is enough. we don't need sort. Implementation:

int radix = 8;
byte[] bitfield = new byte[0xffffffff/radix];
void F() throws FileNotFoundException{
    Scanner in = new Scanner(new FileReader("a.txt"));
    while(in.hasNextInt()){
        int n = in.nextInt();
        bitfield[n/radix] |= (1 << (n%radix));
    }

    for(int i = 0; i< bitfield.lenght; i++){
        for(int j =0; j<radix; j++){
            if( (bitfield[i] & (1<<j)) == 0) System.out.print(i*radix+j);
        }
    }
}

Case 2: 10 MB memory = 10 * 10⁶ * 8 bits = 80 million bits

Conclusion: We decrease memory through increasing file pass.

A clarification for those arriving late: The question, as asked, does not say that there is exactly one integer that is not contained in the file -- at least that's not how most people interpret it. Many comments in the comment thread are about that variation of the task, though. Unfortunately the comment that introduced it to the comment thread was later deleted by its author, so now it looks like the orphaned replies to it just misunderstood everything. It's very confusing. Sorry.

Improved formatting and tags

Source Link

edited Jun 16, 2019 at 18:14

Solomon Ucko

5.9k
3
25
46

Case 1: we have 1 GB = 1 * 10⁹ * 8 bits = 8 billion bits memory.

Case 1: we have 1 GB = 1 * 10⁹ * 8 bits = 8 billion bits memory. SolutionSolution: if we use one bit representing one distinct integer, it is enough. we don't need sort. Implementation:

Case 2: 10 MB memory = 10 * 10⁶ * 8 bits = 80 million bits

Solution: For all possible 16-bit prefixes, there are 2^16 number of
integers = 65536, we need 2^16 * 4 * 8 = 2 million bits. We need build
65536 buckets. For each bucket, we need 4 bytes holding all possibilities because
 the worst case is all the 4 billion integers belong to the same bucket.

step1: Build the counter of each bucket through the first pass through the file.
step2: Scan the buckets, find the first one who has less than 65536 hit.
step3: Build new buckets whose high 16-bit prefixes are we found in step2
through second pass of the file
step4: Scan the buckets built in step3, find the first bucket which doesnt
have a hit.

The code is very similar to above one.

Case 1: we have 1 GB = 1 * 10⁹ * 8 bits = 8 billion bits memory. Solution: if we use one bit representing one distinct integer, it is enough. we don't need sort. Implementation:

Case 2: 10 MB memory = 10 * 10⁶ * 8 bits = 80 million bits

Solution: For all possible 16-bit prefixes, there are 2^16 number of
integers = 65536, we need 2^16 * 4 * 8 = 2 million bits. We need build
65536 buckets. For each bucket, we need 4 bytes holding all possibilities because
 the worst case is all the 4 billion integers belong to the same bucket.

step1: Build the counter of each bucket through the first pass through the file.
step2: Scan the buckets, find the first one who has less than 65536 hit.
step3: Build new buckets whose high 16-bit prefixes are we found in step2
through second pass of the file
step4: Scan the buckets built in step3, find the first bucket which doesnt
have a hit.

The code is very similar to above one.

Case 1: we have 1 GB = 1 * 10⁹ * 8 bits = 8 billion bits memory.

Solution: if we use one bit representing one distinct integer, it is enough. we don't need sort. Implementation:

Case 2: 10 MB memory = 10 * 10⁶ * 8 bits = 80 million bits