Clarified body and title, formatting fixes and format improvements
Source Link

Find Generate an integer that is not among four billion given ones

It is anI have been given this interview question:

My analysis:

My analysis:

We can do external sorting, thus we get toletting us know the range of the integers. 

My question is what is the best way to detect the missing integer in the sorted big integer sets?

My understanding(after reading all answers):

My understanding (after reading all the answers):

Assuming we are talking about 32-bit integers. There, there are 2^32232 = 4*109 distinct integers.

Case 1: we have 1 GB = 1 * 109 * 8 bits = 8 billion bits memory.

Case 1: we have 1 GB = 1 * 109 * 8 bits = 8 billion bits memory.

Solution:

Solution: ifIf we use one bit representing one distinct integer, it is enough. we don't need need sort. Implementation:

Implementation:

Case 2: 10 MB memory = 10 * 106 * 8 bits = 80 million bits

Case 2: 10 MB memory = 10 * 106 * 8 bits = 80 million bits

Solution: For all possible 16-bit prefixes, there are 2^16 number of integers = 65536, we need 2^16 * 4 * 8 = 2 million bits. We need build 65536 buckets. For each bucket, we need 4 bytes holding all possibilities because the worst case is all the 4 billion integers belong to the same bucket.

Solution:

For all possible 16-bit prefixes, there are 216 number of integers = 65536, we need 216 * 4 * 8 = 2 million bits. We need build 65536 buckets. For each bucket, we need 4 bytes holding all possibilities because the worst case is all the 4 billion integers belong to the same bucket.

  1. Build the counter of each bucket through the first pass through the file.
  2. Scan the buckets, find the first one who has less than 65536 hit.
  3. Build new buckets whose high 16-bit prefixes are we found in step2 through second pass of the file
  4. Scan the buckets built in step3, find the first bucket which doesnt have a hit.

The code is very similar to above one.

A clarification for those arriving late: The question, as asked, does not say that there is exactly one integer that is not contained in the file -- at least that's not how most people interpret it. Many comments in the comment thread are about that variation of the task, though. Unfortunately the comment that introduced it to the comment thread was later deleted by its author, so now it looks like the orphaned replies to it just misunderstood everything. It's very confusing. Sorry.

A clarification for those arriving late: The question, as asked, does not say that there is exactly one integer that is not contained in the file—at least that's not how most people interpret it. Many comments in the comment thread **are** about that variation of the task, though. Unfortunately the comment that **introduced** it to the comment thread was later deleted by its author, so now it looks like the orphaned replies to it just misunderstood everything. It's very confusing, sorry.

Find an integer not among four billion given ones

It is an interview question:

My analysis:

We can do external sorting, thus we get to know the range of the integers. My question is what is the best way to detect the missing integer in the sorted big integer sets?

My understanding(after reading all answers):

Assuming we are talking about 32-bit integers. There are 2^32 = 4*109 distinct integers.

Case 1: we have 1 GB = 1 * 109 * 8 bits = 8 billion bits memory.

Solution: if we use one bit representing one distinct integer, it is enough. we don't need sort. Implementation:

Case 2: 10 MB memory = 10 * 106 * 8 bits = 80 million bits

Solution: For all possible 16-bit prefixes, there are 2^16 number of integers = 65536, we need 2^16 * 4 * 8 = 2 million bits. We need build 65536 buckets. For each bucket, we need 4 bytes holding all possibilities because the worst case is all the 4 billion integers belong to the same bucket.

  1. Build the counter of each bucket through the first pass through the file.
  2. Scan the buckets, find the first one who has less than 65536 hit.
  3. Build new buckets whose high 16-bit prefixes are we found in step2 through second pass of the file
  4. Scan the buckets built in step3, find the first bucket which doesnt have a hit.

The code is very similar to above one.

A clarification for those arriving late: The question, as asked, does not say that there is exactly one integer that is not contained in the file -- at least that's not how most people interpret it. Many comments in the comment thread are about that variation of the task, though. Unfortunately the comment that introduced it to the comment thread was later deleted by its author, so now it looks like the orphaned replies to it just misunderstood everything. It's very confusing. Sorry.

Generate an integer that is not among four billion given ones

I have been given this interview question:

My analysis:

We can do external sorting, thus letting us know the range of the integers. 

My question is what is the best way to detect the missing integer in the sorted big integer sets?

My understanding (after reading all the answers):

Assuming we are talking about 32-bit integers, there are 232 = 4*109 distinct integers.

Case 1: we have 1 GB = 1 * 109 * 8 bits = 8 billion bits memory.

Solution:

If we use one bit representing one distinct integer, it is enough. we don't need sort.

Implementation:

Case 2: 10 MB memory = 10 * 106 * 8 bits = 80 million bits

Solution:

For all possible 16-bit prefixes, there are 216 number of integers = 65536, we need 216 * 4 * 8 = 2 million bits. We need build 65536 buckets. For each bucket, we need 4 bytes holding all possibilities because the worst case is all the 4 billion integers belong to the same bucket.

  1. Build the counter of each bucket through the first pass through the file.
  2. Scan the buckets, find the first one who has less than 65536 hit.
  3. Build new buckets whose high 16-bit prefixes are we found in step2 through second pass of the file
  4. Scan the buckets built in step3, find the first bucket which doesnt have a hit.

The code is very similar to above one.

A clarification for those arriving late: The question, as asked, does not say that there is exactly one integer that is not contained in the file—at least that's not how most people interpret it. Many comments in the comment thread **are** about that variation of the task, though. Unfortunately the comment that **introduced** it to the comment thread was later deleted by its author, so now it looks like the orphaned replies to it just misunderstood everything. It's very confusing, sorry.
Improved Formatting
Source Link
phuclv
  • 39.6k
  • 15
  • 170
  • 491

It is an interview question:

Given an input file with four billion integers, provide an algorithm to generate an integer which is not contained in the file. Assume you have 1 GB memory. Follow up with what you would do if you have only 10 MB of memory.

My analysis:

The size of the file is 4×109×4 bytes = 16 GB.

We can do external sorting, thus we get to know the range of the integers. My question is what is the best way to detect the missing integer in the sorted big integer sets?

My understanding(after reading all answers):

Assuming we are talking about 32-bit integers. There are 2^32 = 4*109 distinct integers.

Case 1: we have 1 GB = 1 * 109 * 8 bits = 8 billion bits memory.

Solution: if we use one bit representing one distinct integer, it is enough. we don't need sort. Implementation:

int radix = 8;
byte[] bitfield = new byte[0xffffffff/radix];
void F() throws FileNotFoundException{
    Scanner in = new Scanner(new FileReader("a.txt"));
    while(in.hasNextInt()){
        int n = in.nextInt();
        bitfield[n/radix] |= (1 << (n%radix));
    }

    for(int i = 0; i< bitfield.lenght; i++){
        for(int j =0; j<radix; j++){
            if( (bitfield[i] & (1<<j)) == 0) System.out.print(i*radix+j);
        }
    }
}

Case 2: 10 MB memory = 10 * 106 * 8 bits = 80 million bits

Solution: For all possible 16-bit prefixes, there are 2^16 number of integers = 65536, we need 2^16 * 4 * 8 = 2 million bits. We need build 65536 buckets. For each bucket, we need 4 bytes holding all possibilities because the worst case is all the 4 billion integers belong to the same bucket.

  1. Build the counter of each bucket through the first pass through the file.
  2. Scan the buckets, find the first one who has less than 65536 hit.
  3. Build new buckets whose high 16-bit prefixes are we found in step2 through second pass of the file
  4. Scan the buckets built in step3, find the first bucket which doesnt have a hit.

The code is very similar to above one.

Conclusion: We decrease memory through increasing file pass.


A clarification for those arriving late: The question, as asked, does not say that there is exactly one integer that is not contained in the file -- at least that's not how most people interpret it. Many comments in the comment thread are about that variation of the task, though. Unfortunately the comment that introduced it to the comment thread was later deleted by its author, so now it looks like the orphaned replies to it just misunderstood everything. It's very confusing. Sorry.

It is an interview question:

Given an input file with four billion integers, provide an algorithm to generate an integer which is not contained in the file. Assume you have 1 GB memory. Follow up with what you would do if you have only 10 MB of memory.

My analysis:

The size of the file is 4×109×4 bytes = 16 GB.

We can do external sorting, thus we get to know the range of the integers. My question is what is the best way to detect the missing integer in the sorted big integer sets?

My understanding(after reading all answers):

Assuming we are talking about 32-bit integers. There are 2^32 = 4*109 distinct integers.

Case 1: we have 1 GB = 1 * 109 * 8 bits = 8 billion bits memory.

Solution: if we use one bit representing one distinct integer, it is enough. we don't need sort. Implementation:

int radix = 8;
byte[] bitfield = new byte[0xffffffff/radix];
void F() throws FileNotFoundException{
    Scanner in = new Scanner(new FileReader("a.txt"));
    while(in.hasNextInt()){
        int n = in.nextInt();
        bitfield[n/radix] |= (1 << (n%radix));
    }

    for(int i = 0; i< bitfield.lenght; i++){
        for(int j =0; j<radix; j++){
            if( (bitfield[i] & (1<<j)) == 0) System.out.print(i*radix+j);
        }
    }
}

Case 2: 10 MB memory = 10 * 106 * 8 bits = 80 million bits

Solution: For all possible 16-bit prefixes, there are 2^16 number of integers = 65536, we need 2^16 * 4 * 8 = 2 million bits. We need build 65536 buckets. For each bucket, we need 4 bytes holding all possibilities because the worst case is all the 4 billion integers belong to the same bucket.

  1. Build the counter of each bucket through the first pass through the file.
  2. Scan the buckets, find the first one who has less than 65536 hit.
  3. Build new buckets whose high 16-bit prefixes are we found in step2 through second pass of the file
  4. Scan the buckets built in step3, find the first bucket which doesnt have a hit.

The code is very similar to above one.

Conclusion: We decrease memory through increasing file pass.


A clarification for those arriving late: The question, as asked, does not say that there is exactly one integer that is not contained in the file -- at least that's not how most people interpret it. Many comments in the comment thread are about that variation of the task, though. Unfortunately the comment that introduced it to the comment thread was later deleted by its author, so now it looks like the orphaned replies to it just misunderstood everything. It's very confusing. Sorry.

It is an interview question:

Given an input file with four billion integers, provide an algorithm to generate an integer which is not contained in the file. Assume you have 1 GB memory. Follow up with what you would do if you have only 10 MB of memory.

My analysis:

The size of the file is 4×109×4 bytes = 16 GB.

We can do external sorting, thus we get to know the range of the integers. My question is what is the best way to detect the missing integer in the sorted big integer sets?

My understanding(after reading all answers):

Assuming we are talking about 32-bit integers. There are 2^32 = 4*109 distinct integers.

Case 1: we have 1 GB = 1 * 109 * 8 bits = 8 billion bits memory.

Solution: if we use one bit representing one distinct integer, it is enough. we don't need sort. Implementation:

int radix = 8;
byte[] bitfield = new byte[0xffffffff/radix];
void F() throws FileNotFoundException{
    Scanner in = new Scanner(new FileReader("a.txt"));
    while(in.hasNextInt()){
        int n = in.nextInt();
        bitfield[n/radix] |= (1 << (n%radix));
    }

    for(int i = 0; i< bitfield.lenght; i++){
        for(int j =0; j<radix; j++){
            if( (bitfield[i] & (1<<j)) == 0) System.out.print(i*radix+j);
        }
    }
}

Case 2: 10 MB memory = 10 * 106 * 8 bits = 80 million bits

Solution: For all possible 16-bit prefixes, there are 2^16 number of integers = 65536, we need 2^16 * 4 * 8 = 2 million bits. We need build 65536 buckets. For each bucket, we need 4 bytes holding all possibilities because the worst case is all the 4 billion integers belong to the same bucket.

  1. Build the counter of each bucket through the first pass through the file.
  2. Scan the buckets, find the first one who has less than 65536 hit.
  3. Build new buckets whose high 16-bit prefixes are we found in step2 through second pass of the file
  4. Scan the buckets built in step3, find the first bucket which doesnt have a hit.

The code is very similar to above one.

Conclusion: We decrease memory through increasing file pass.


A clarification for those arriving late: The question, as asked, does not say that there is exactly one integer that is not contained in the file -- at least that's not how most people interpret it. Many comments in the comment thread are about that variation of the task, though. Unfortunately the comment that introduced it to the comment thread was later deleted by its author, so now it looks like the orphaned replies to it just misunderstood everything. It's very confusing. Sorry.

Improved formatting and tags
Source Link
Solomon Ucko
  • 5.9k
  • 3
  • 25
  • 46

Case 1: we have 1 GB = 1 * 109 * 8 bits = 8 billion bits memory.

Case 1: we have 1 GB = 1 * 109 * 8 bits = 8 billion bits memory. SolutionSolution: if we use one bit representing one distinct integer, it is enough. we don't need sort. Implementation:

Case 2: 10 MB memory = 10 * 106 * 8 bits = 80 million bits

Case 2: 10 MB memory = 10 * 106 * 8 bits = 80 million bits

Solution: For all possible 16-bit prefixes, there are 2^16 number of
integers = 65536, we need 2^16 * 4 * 8 = 2 million bits. We need build
65536 buckets. For each bucket, we need 4 bytes holding all possibilities because
 the worst case is all the 4 billion integers belong to the same bucket.

step1: Build the counter of each bucket through the first pass through the file.
step2: Scan the buckets, find the first one who has less than 65536 hit.
step3: Build new buckets whose high 16-bit prefixes are we found in step2
through second pass of the file
step4: Scan the buckets built in step3, find the first bucket which doesnt
have a hit.

The code is very similar to above one.

Solution: For all possible 16-bit prefixes, there are 2^16 number of integers = 65536, we need 2^16 * 4 * 8 = 2 million bits. We need build 65536 buckets. For each bucket, we need 4 bytes holding all possibilities because the worst case is all the 4 billion integers belong to the same bucket.

  1. Build the counter of each bucket through the first pass through the file.
  2. Scan the buckets, find the first one who has less than 65536 hit.
  3. Build new buckets whose high 16-bit prefixes are we found in step2 through second pass of the file
  4. Scan the buckets built in step3, find the first bucket which doesnt have a hit.

The code is very similar to above one.

Case 1: we have 1 GB = 1 * 109 * 8 bits = 8 billion bits memory. Solution: if we use one bit representing one distinct integer, it is enough. we don't need sort. Implementation:

Case 2: 10 MB memory = 10 * 106 * 8 bits = 80 million bits

Solution: For all possible 16-bit prefixes, there are 2^16 number of
integers = 65536, we need 2^16 * 4 * 8 = 2 million bits. We need build
65536 buckets. For each bucket, we need 4 bytes holding all possibilities because
 the worst case is all the 4 billion integers belong to the same bucket.

step1: Build the counter of each bucket through the first pass through the file.
step2: Scan the buckets, find the first one who has less than 65536 hit.
step3: Build new buckets whose high 16-bit prefixes are we found in step2
through second pass of the file
step4: Scan the buckets built in step3, find the first bucket which doesnt
have a hit.

The code is very similar to above one.

Case 1: we have 1 GB = 1 * 109 * 8 bits = 8 billion bits memory.

Solution: if we use one bit representing one distinct integer, it is enough. we don't need sort. Implementation:

Case 2: 10 MB memory = 10 * 106 * 8 bits = 80 million bits

Solution: For all possible 16-bit prefixes, there are 2^16 number of integers = 65536, we need 2^16 * 4 * 8 = 2 million bits. We need build 65536 buckets. For each bucket, we need 4 bytes holding all possibilities because the worst case is all the 4 billion integers belong to the same bucket.

  1. Build the counter of each bucket through the first pass through the file.
  2. Scan the buckets, find the first one who has less than 65536 hit.
  3. Build new buckets whose high 16-bit prefixes are we found in step2 through second pass of the file
  4. Scan the buckets built in step3, find the first bucket which doesnt have a hit.

The code is very similar to above one.

Rollback to Revision 17
Source Link
Loading
Question Protected by Anshul Goyal
added 23 characters in body
Source Link
Cole Tobin
  • 9.3k
  • 15
  • 50
  • 76
Loading
attempt to prevent more confusion in comments
Source Link
Loading
added 51 characters in body
Source Link
Peter Mortensen
  • 31k
  • 22
  • 108
  • 132
Loading
added 4 characters in body
Source Link
SecureFish
  • 2.4k
  • 8
  • 33
  • 43
Loading
added 11 characters in body
Source Link
SecureFish
  • 2.4k
  • 8
  • 33
  • 43
Loading
added 591 characters in body
Source Link
SecureFish
  • 2.4k
  • 8
  • 33
  • 43
Loading
added language hint - didn&#39;t want to add c# tag as OP isn&#39;t about C# per se, only example code; Post Made Community Wiki
Source Link
Loading
Add missing superscript
Source Link
Phil Ross
  • 25.8k
  • 9
  • 68
  • 78
Loading
deleted 168 characters in body
Source Link
SecureFish
  • 2.4k
  • 8
  • 33
  • 43
Loading
added 8 characters in body
Source Link
SecureFish
  • 2.4k
  • 8
  • 33
  • 43
Loading
edited body
Source Link
Ólafur Waage
  • 69.4k
  • 22
  • 143
  • 199
Loading
added 1223 characters in body
Source Link
SecureFish
  • 2.4k
  • 8
  • 33
  • 43
Loading
added 5 characters in body
Source Link
Roee Adler
  • 33.7k
  • 32
  • 105
  • 133
Loading
retitle
Link
Loading
deleted 1 characters in body; edited title
Source Link
Yi Jiang
  • 49.7k
  • 16
  • 136
  • 136
Loading
edited tags
Link
Loading
added 5 characters in body
Source Link
Kredns
  • 36.8k
  • 52
  • 155
  • 204
Loading
Source Link
SecureFish
  • 2.4k
  • 8
  • 33
  • 43
Loading