# Find the Duplicate Number

## Description

Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.

**Example 1:**

```
Input: [1,3,4,2,2]
Output: 2
```

**Example 2:**

```
Input: [3,1,3,4,2]
Output: 3
```

**Note:**

1. You **must not** modify the array (assume the array is read only).
2. You must use only constant, O(1) extra space.
3. Your runtime complexity should be less than *O*(*n*2).
4. There is only one duplicate number in the array, but it could be repeated more than once.

## Solutions

### Binary search

Idea: let the number of numbers $$\le$$ `mid` be `count`. If `count > mid`, then the duplicate must be in `[1, mid]`. Otherwise the duplicate must be in `[mid + 1, n]`.

```cpp
class Solution {
public:
    int findDuplicate(vector<int>& nums) {
        int lo = 1, hi = nums.size() - 1;
        while(lo < hi){
            int mid = (lo + hi) / 2;
            int count = 0;
            for(int num : nums){
                if(num <= mid)
                    ++count;
            }
            if(count <= mid){
                lo = mid + 1;
            }else{
                hi = mid;
            }
        }
        return lo;
    }
};
```

### Fast and slow pointers

Idea: consider a graph with nodes `0, 1, ..., n`. There is an edge from `i`  to `j` if `nums[i] = j`. The graph must be a $$\rho$$-shaped linked-list. `nums[i] > 0` for all `i`, so node `0` has no incoming edges. So the head node is `0`.

Let $$l\_f, l\_s$$ be the distances traveled by fast and slow pointers when they meet, $$l\_c$$ be the length of the cycle. $$l\_f = 2 \cdot l\_s = l\_s + k \cdot l\_c$$ (the fast pointer travel $$k$$ more cycles.). So $$l\_s = k \cdot l\_c$$.

Set the slow pointer back to node `0`, and let both pointers move at the same speed. They must meet at the entry of the cycle. Let $$l$$ be the distance between the head node and the entry node. When the slow pointer reaches the entry node, the fast pointer travels $$l\_f + l = l\_s + k \cdot l\_c + l= 2k \cdot l\_c + l$$. So the fast pointer is also at the entry node.

Reference: [detailed explanation](http://keithschwarz.com/interesting/code/?dir=find-duplicate)

```cpp
class Solution {
public:
    int findDuplicate(vector<int>& nums) {
        int fast = 0, slow = 0;
        do{
            slow = nums[slow];
            fast = nums[nums[fast]];
        }while(slow != fast);
        slow = 0;
        while(slow != fast){
            slow = nums[slow];
            fast = nums[fast];
        }
        return slow;
    }
};
```

### Bit counting

One interesting solution but I can't understand. [O(32\*N) solution using bit manipulation in 10 lines](https://leetcode.com/problems/find-the-duplicate-number/discuss/72872/O\(32*N\)-solution-using-bit-manipulation-in-10-lines).
