Find the Duplicate Number

Description

Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.

Example 1:

Input: [1,3,4,2,2]
Output: 2

Example 2:

Input: [3,1,3,4,2]
Output: 3

Note:

  1. You must not modify the array (assume the array is read only).

  2. You must use only constant, O(1) extra space.

  3. Your runtime complexity should be less than O(n2).

  4. There is only one duplicate number in the array, but it could be repeated more than once.

Solutions

Idea: let the number of numbers \le mid be count. If count > mid, then the duplicate must be in [1, mid]. Otherwise the duplicate must be in [mid + 1, n].

class Solution {
public:
    int findDuplicate(vector<int>& nums) {
        int lo = 1, hi = nums.size() - 1;
        while(lo < hi){
            int mid = (lo + hi) / 2;
            int count = 0;
            for(int num : nums){
                if(num <= mid)
                    ++count;
            }
            if(count <= mid){
                lo = mid + 1;
            }else{
                hi = mid;
            }
        }
        return lo;
    }
};

Fast and slow pointers

Idea: consider a graph with nodes 0, 1, ..., n. There is an edge from i to j if nums[i] = j. The graph must be a ρ\rho-shaped linked-list. nums[i] > 0 for all i, so node 0 has no incoming edges. So the head node is 0.

Let lf,lsl_f, l_s be the distances traveled by fast and slow pointers when they meet, lcl_c be the length of the cycle. lf=2ls=ls+klcl_f = 2 \cdot l_s = l_s + k \cdot l_c (the fast pointer travel kk more cycles.). So ls=klcl_s = k \cdot l_c.

Set the slow pointer back to node 0, and let both pointers move at the same speed. They must meet at the entry of the cycle. Let ll be the distance between the head node and the entry node. When the slow pointer reaches the entry node, the fast pointer travels lf+l=ls+klc+l=2klc+ll_f + l = l_s + k \cdot l_c + l= 2k \cdot l_c + l. So the fast pointer is also at the entry node.

Reference: detailed explanation

class Solution {
public:
    int findDuplicate(vector<int>& nums) {
        int fast = 0, slow = 0;
        do{
            slow = nums[slow];
            fast = nums[nums[fast]];
        }while(slow != fast);
        slow = 0;
        while(slow != fast){
            slow = nums[slow];
            fast = nums[fast];
        }
        return slow;
    }
};

Bit counting

One interesting solution but I can't understand. O(32*N) solution using bit manipulation in 10 lines.

Last updated