Finding the Median in Linear time

some eggs
Here are three C++ programs to find the median of a set of numbers. The first is short and simple. The last is fiddly and fast.

Simple but not linear (n*log n time):

Sort using STL sort (presumably quicksort), then find the middle value:

#include <iostream>
#include <algorithm>

int main() {
  int A[7] = {23, 1, 33, -20, 6, 6, 9};
  std::sort(A,A+7);
  std::cout << "The medium is: " << A[3] << std::endl;
}

Linear time (But too much memory used in recursion):

The running time of the following algorithm is linear rather than logarithmic. The reasoning behind this is that each recursive call takes linear time (on average), and each recursive call reduces the size of the problem by a constant factor:

#include <iostream>
#include <vector>
using namespace std;

// Find Kth element using recursion
int getKElem(vector<int> A,int K) {
  int a = *A.begin();  // Pick randomly a number a from A = {a1, ..., an}.

  // gather all numbers smaller, equal and bigger than a.
  vector<int> S, E, B;
  for(vector<int>::const_iterator ci = A.begin(); ci!=A.end(); ci++) {
    if (*ci < a)      S.push_back(*ci);
    else if (*ci > a) B.push_back(*ci);
    else              E.push_back(*ci);
  }

  if (S.size() >= K) return getKElem(S,K);
  else if (S.size()+E.size() >= K) return a;
  else return getKElem(B,K-S.size()-E.size());
}

int main() {
  int myints[7] = {23, 1, 33, -20, 6, 6, 9};
  vector<int> A (myints, myints + sizeof(myints) / sizeof(int) );

  cout << "Element " << A.size()/2 << " is: " << getKElem(A,A.size()/2) << endl;
}

Linear time with decent memory footprint:

Here is the algorithm with the recursion unraveled:

#include <iostream>
#include <vector>
using namespace std;

// Find Kth element without recusion
int findKMedian(vector<int> A,int K) {
  int l,m;
  l=0;
  m=A.size()-1;
  while (l<m) {
    int x=A[K];
    int i=l;
    int j=m;
    do {
      while (A[i]<x) i++;
      while (x<A[j]) j--;
      if (i<=j) {
        int t = A[i]; A[i]= A[j]; A[j] = t; // Swap A[i] & A[j]
        i++; j--;
      }
    } while (i<=j);
    if (j<K) l=i;
    if (K<i) m=j;
  }
  return A[K];
}

int main() {
  int myints[7] = {23, 1, 33, -20, 6, 6, 9};
  vector<int> A (myints, myints + sizeof(myints) / sizeof(int) );

  cout << "Element " << A.size()/2 << " is: " << findKMedian(A,A.size()/2) << endl;
}

4 responses to “Finding the Median in Linear time”

  1. Vladimir

    These algorithms do not run in linear time.
    First of all,
    int a = *A.begin();
    — is a bad choice: for a sorted array this algorithm will find median in quadratic(!) time.
    int a = A[random()%A.size()];
    — a bit better, but worst case is still quadratic, though expected time is linear.
    See wikipedia for better (and more complex) choices.

  2. joe

    Or you can just use the BFPRT algorithm. guaranteed linear time, yet still elegant.

  3. Clint

    I’d like to find out more? I’d care to find out more details.

  4. Renato Nickerson

    some truly choice articles on this web site , saved to my bookmarks .

Leave a Reply

%d bloggers like this: