Software‎ > ‎

Captcha Processing

Computers have the ability to process thousands of calculations per second, so they should be able to perform simple tasks that we can do. The average computer today probably can’t quite match the processing power of a human brain, but still it’s not just a paperweight. The difficulty comes in translating something as complex as reading text into those add and subtract instructions. But it is possible, even if the text is distorted. Here’s a project I worked on a while back. General image processing or OCR programs didn’t work all that well, so I came up with my own way to solve the problem and I’m going to outline the ideas here.

Here are some of the captchas in question.
7 9 22 26 29 24

Some interesting things to notice are that they are always 5 letters in length. The text is in roughly the same area. And there’s a large contrast difference between the text and most of the background. These simple observations can help reduce the amount of ‘translating’ into machine understanding we’d have to do.

[1] Pre Processing
If it was just plain text then the problem is a lot easier to solve. But here there’s a large amount of background noise, unnecessary color information, and some rotation of the letters. Let’s find a way to get rid of that stuff.

We can take advantage of the large contrast difference between the text and background. If we can determine whether to remove the light or dark areas, we could eliminate a lot of non-text area.

22 
[In this captcha the text is lighter and background is darker.]

To do this, separate the image into its RGB channels. This is essentially all that makes up an image, a certain value for the red, green, and blue. A lower number means the color is absent and a high number means it is present. For example, all 0’s would mean the image is black and no colors are present.
  
[Red, Green, and Blue channels respectively of the image]

Choose the channel that produces the highest contrast. This is pretty simple, just find the channel that deviates most from its average value. I’ll demonstrate for the red channel.

int red_average // Compute this by summing the red values for all pixels and dividing by the number of pixels.
BMP captcha // The captcha image object

int x = 0, y = 0;
int red_deviation = 0;

for (y = 0; y < height; y++) {
      while (x < length) {
            red_deviation += abs(red_average - captcha(x,y)->Red);
            x = x + 1;
      }
      x = 0;
}
         

Now, part of the background we want to remove can be composed of values larger or smaller than the average value, for example
int red_average. To determine this, we take advantage of the fact that the text is generally in the center area. Below, I’ve highlighted in red the top 1/8 th of the image, and the bottom 1/8 th. Also the 1/8 th of the image right above and below the horizontal center.

We will simultaneously remove all values above the average, and below the average on two copies of the image object, and determine which is the correct copy by analyzing the number of white pixels within the highlighted region. The idea is that if we remove the background, there should be more white pixels in the bottom and top 1/8 than the center. Below, I have part of the algorithm for the red channel, and the two image objects:

      // Two copies of the captcha object. [1] will have pixels below average removed, [2] above average.
      BMP captcha_copy[2] = { BMP(captcha), BMP(captcha) };
     
      // Iterate through pixels
      for (y = 0, x = 0; y < height; y++) {
            while (x < width) {
                  // Remove pixels below the average value.
                  if (captcha(x,y)->Red <= red_average) {
                        captcha_copy[0](x,y)->Red = 255; captcha_copy[0](x,y)->Green = 255; captcha_copy[0](x,y)->Blue = 255;
                  }
                  // Remove pixels above the average value.
                  else {
                        captcha_copy[1](x,y)->Red = 255; captcha_copy[1](x,y)->Green = 255; captcha_copy[1](x,y)->Blue = 255;
                  }
                  x = x + 1;
            }
            x = 0;
      }


 
 
And now the algorithm to compute the white pixel ratio. We want a higher ratio.

int white_pixel_counter[2][2]; // [i][j]: i identifies which captcha copy, j identifies top/bottom or center. Initialize values to 0.

for (int i = 0; i < 2; i++) {
      for (y = 0, x = 0; y < height/8; y++) {
            while (x < width) {
                  // Top 1/8.
                  if ( captcha_copy[i](x, y)->Red == 255
                        && captcha_copy[i](x, y)->Green == 255
                        && captcha_copy[i](x, y)->Blue == 255) {
                        white_pixel_counter[i][0]++;
                  }
                  // Mid 1/8 - 1/4 actually, we will sum the other two values.
                  if ( captcha_copy[i](x, y + height/2)->Red == 255
                        && captcha_copy[i](x, y + height/2)->Green == 255
                        && captcha_copy[i](x, y + height/2)->Blue == 255) {
                        white_pixel_counter[i][1]++;
                  }
                  if ( captcha_copy[i](x, height/2 - y)->Red == 255
                        && captcha_copy[i](x, height/2 - y)->Green == 255
                        && captcha_copy[i](x, height/2 - y)->Blue == 255) {
                        white_pixel_counter[i][1]++;
                  }
                  // Bottom 1/8.
                  if ( captcha_copy[i](x, height - 1 - y)->Red == 255
                        && captcha_copy[i](x, height - 1 - y)->Green == 255
                        && captcha_copy[i](x, height - 1 - y)->Blue == 255) {
                        white_pixel_counter[i][0]++;
                  }
                  x = x + 1;
            }
            x = 0;
      }
}

double white_pixel_ratio[2];
white_pixel_ratio[0] = double(white_pixel_counter[0][0])/double(white_pixel_counter[0][1]);
white_pixel_ratio[1] = double(white_pixel_counter[1][0])/double(white_pixel_counter[1][1]);


In this sample, the ratios are as follows. They can be confirmed correct by visual examination of the captcha objects above.
Captcha copy 1: (lower than average pixels removed): 1.10443
Captcha copy 2: (higher than average pixels removed): 0.914063


From now on we will work with the captcha image object which has the correct background noise partially removed. Rerun the previous algorithms to compute again the average pixel values for each channel, and the deviation from the average for each channel. This time, however, also compute whether more pixels are above or below the average value for the particular channel. (Note: make sure you don’t accidentally include pixels which you have previously removed in your new computations). You will notice that in the distributions of pixels, there is a range which is frequent. This area is the background, so if we remove those pixels, we will be able to clear more of the background. Let me demonstrate with a histogram.


[On the x-axis we have 8 bits representing the red channel (0-255). The average value for a pixel in the red channel is 140. The bulky area below this average value is the background.]

Here’s how you would compute the number of pixels above or below the average.

int less_or_greater_than_avg[2] = {0, 0}; // Keeps track of how many pixels are [0] below than the average, and [1] above the average.

for (y = 0, x = 0; y < height; y++) {
      while (x < width) {
            if (!(captchaB(x,y)->Red == 255 && captchaB(x,y)->Green == 255 && captchaB(x,y)->Blue == 255)) {
                  if (captchaB(x,y)->Red <= red_average) {
                        less_or_greater_than_avg[0]++;
                  }
                  else {
                        less_or_greater_than_avg[1]++;
                  }
            }
            x = x + 1;
      }
      x = 0;
}


And now, removing the appropriate pixels. Shown here for the red channel only.

for (y = 0, x = 0; y < height; y++) {
      while (x < width) {
            if (!(captchaB(x,y)->Red == 255 && captchaB(x,y)->Green == 255 && captchaB(x,y)->Blue == 255)) {
                  // Here, most of the pixels are less than the average
                  if (less_or_greater_than_avg[0] > less_or_greater_than_avg[1]) {
                        if (captchaB(x,y)->Red <= red_average) {
                              captchaB(x,y)->Red = 255;
                              captchaB(x,y)->Green = 255;
                              captchaB(x,y)->Blue = 255;
                        }
                  }
                  // Here, most of the pixels are greater than the average
                  else {
                        if (captchaB(x,y)->Red > red_average) {
                              captchaB(x,y)->Red = 255;
                              captchaB(x,y)->Green = 255;
                              captchaB(x,y)->Blue = 255;
                        }
                  }
            }
            x = x + 1;
      }
      x = 0;
}

Here is what you will end up with now. It looks a lot better.

captcha3


[2] At this point, we are left with an image whose background is mostly white, but still there is some arbitrary noise. There’s a simple way to detect the noise and remove it, based on the principle that most of the noise is composed of only a few pixels relative to the text. Essentially, we will iterate through all the pixels in the captcha image object.  If we find a non white pixel, we will add it to a stack. We will search for unvisited pixels in 4 directions from this starting pixel, for a similarly colored pixel, and add that, and continue the search. If we’ve added enough total pixels to the stack, then we know that the originating pixel cannot be part of the background noise. Let me demonstrate with a diagram, composed of either black or white pixels.

                
[Two connected pixel ranges. The first has all pixels visited in 3 iterations, the second in 18 iterations. Depending on a threshold value, say 4 in this case, we can determine where the noise is and remove it.]

For simplicity’s sake let us create a struct to represent the coordinates within the captcha image.
      struct coord {
          int x, y;
      };


And let’s get a stack of those coordinates, and another int to keep track of the number of elements we add:
      stack<coord> stack1;
      int stack1_size = 0;

The pointer to the captcha object is,
      BMP * im2;

Iterate through the pixels here, and perform the algorithm which is shown here, but you would need to do it for all 4 sides. Also you need to make sure once you process a pixel during an iteration, you don’t revisit it, so keep track of what you’ve visited using a 2D array.
 
      for (int i = 0; i < width; i++) {
            for (int j = 0; j < height; j++) {

                  // Clear everything to a start state, empty stack
                  while (!stack1.empty()) {
                        stack1.pop();
                  }
                  stack1_size = 0;


                  // Set coordinates to current position
                  coord a, b;
                  a.x = i, a.y = j;

                  // Put the first non-white coordinate on the stack
                  if( ! (is_color_similar( ((*im2)(i, j)), &white_pix, 1 ) ) ){
                        stack1.push(a);
                  }

                  while(!stack1.empty()) {           
                        a = stack1.top();
                        stack1.pop();

                        // Look at the pixel below the current one, (x, y + 1)
                        b = a;
                        b.y += 1;
                        if (b.y >= 0 && b.y < height && b.x >= 0 && b.x < width
                              && is_color_similar( ((*im2)(a.x, a.y)), ((*im2)(b.x, b.y)), COLOR_SIMILARITY_THRESHOLD )
                              && ! is_color_similar( ((*im2)(b.x, b.y)), &white_pix, 1 ))
                        {
                              stack1.push(b);
                              stack1_size++;   
                        }

                        // Look at the pixel to the right of the current one, (x + 1, y)
                        b = a;
                        b.x += 1;
                        if (b.y >= 0 && b.y < height && b.x >= 0 && b.x < width
                              && is_color_similar( ((*im2)(a.x, a.y)), ((*im2)(b.x, b.y)), COLOR_SIMILARITY_THRESHOLD )
                              && ! is_color_similar( ((*im2)(b.x, b.y)), &white_pix, 1 ))
                        {
                              stack1.push(b);
                              stack1_size++;
                        }

                        // Also look at (x, y - 1), and  (x - 1, y). (Not shown here)
                       
                        // If there are sufficient elements, we can stop now.
                        if(stack1_size > STACK_MIN_THRESHOLD) {
                              while (!stack1.empty()) {
                                    a = stack1.top();
                                    stack1.pop();
                              }
                              break;
                        }
                  }
     
                  // In case there are not sufficient elements, we remove this pixel. It's part of the noise.
                  if (stack1_size < STACK_MIN_THRESHOLD) {
                        ((*im2)(i, j))->Red = 255, ((*im2)(i, j))->Green = 255, ((*im2)(i, j))->Blue = 255;
                  }
            }
      }


After some testing, I have determined the default threshold values as follows. But for best results, you can implement some additional logic to vary these values dynamically, shown below.
      int STACK_MIN_THRESHOLD = 12;
      int COLOR_SIMILARITY_THRESHOLD = 40;

You can dynamically calculate the threshold for the color similarity by looking at the distribution of all RGB values across all pixels. Sort these values, and select the center 50 percent range of the distribution by finding the value at 75th percentile, and subtracting it from the value at the 25th percentile. This value is referred to as the IRQ. If we multiply this range by 2, we essentially have the range of all the values for a particular channel. The threshold in this case should be 40% of the total. The code to calculate this range is show below. In the end, use the value that corresponds to the highest contrast channel.

      // These vectors will hold the values for each channel for a non white pixel. We'll use this to get a good idea of the range of values.
      vector<int> red_counter, green_counter, blue_counter;

      for (y = 0, x = 0; y < height; y++) {
            while (x < width) {
                  if (!((*captchaB)(x,y)->Red == 255 && (*captchaB)(x,y)->Green == 255 && (*captchaB)(x,y)->Blue == 255)) {
                        red_counter.push_back((*captchaB)(x,y)->Red);
                        green_counter.push_back((*captchaB)(x,y)->Green);
                        blue_counter.push_back((*captchaB)(x,y)->Blue);
                  }
                  x = x + 1;
            }
            x = 0;
      }

      sort (red_counter.begin(), red_counter.end());
      sort (green_counter.begin(), green_counter.end());
      sort (blue_counter.begin(), blue_counter.end());
     
      // Here we calculate what 40% of the range is for each channel, and use that. We use Q3-Q1 to eliminate outliers.
      int red_calc_range = abs(0.40*2.0*(red_counter.at(red_counter.size()*(3.0/4.0)) - red_counter.at(red_counter.size()*(1.0/4.0))));
      int green_calc_range = abs(0.40*2.0*(green_counter.at(red_counter.size()*(3.0/4.0)) - green_counter.at(red_counter.size()*(1.0/4.0))));
      int blue_calc_range = abs(0.40*2.0*(blue_counter.at(red_counter.size()*(3.0/4.0)) - blue_counter.at(red_counter.size()*(1.0/4.0))));

To dynamically compute the threshold for the stack size for the noise, we use a similar idea. We iterate through the captcha and find the distribution of all the noises and characters using the same noise removal algorithm shown above. We will sort this distribution, and find the average size that falls within the 85 to 90 percent range. This is where the largest of the noises generally falls. For added assurance, find the range that falls between the 95 to 100 percent range, which is the size of the characters. The threshold value should be
avg_85_to_90 + avg_95_to_100*0.06.
      // This vector holds the sizes of all the nonconnected noises or characters within the captcha.
      vector<int> stack_size_vector;

      // Iterate through the captcha, and add the sizes to the vector...

      sort(stack_size_vector.begin(), stack_size_vector.end());
     
      vector<int>::iterator at_95 = stack_size_vector.end() - stack_size_vector.size()*0.05;
      vector<int>::iterator at_85 = stack_size_vector.end() - stack_size_vector.size()*0.15, at_90 = stack_size_vector.end() - stack_size_vector.size()*0.10;
      double sum_85_to_90 = 0, sum_95_to_100 = 0;
      double avg_85_to_90 = 0.0, avg_95_to_100 = 0.0;
      while(at_85 != at_90 && at_85 != stack_size_vector.end()) {
            sum_85_to_90 += *at_85;
            at_85++;
      }
      while(at_95 != stack_size_vector.end()) {
            sum_95_to_100 += *at_95;
            at_95++;
      }
      avg_85_to_90 = sum_85_to_90 / ((double)stack_size_vector.size()*0.05);
      avg_95_to_100 = sum_95_to_100 / ((double)stack_size_vector.size()*0.05);

      // We want the average from 85% to 90% which will be the largest of the noise areas, and add a margin
      // that is 0.06 of the character areas which will be from 95% to 100%.
      return avg_85_to_90 +  avg_95_to_100*0.06;


The white pixels is a RGBpixel object with 255 for all channels:       
      RGBApixel white_pix;
      white_pix.Alpha = 0, white_pix.Red = 255, white_pix.Green = 255, white_pix.Blue = 255;

And an auxiliary function which determines whether two pixels are of a similar color, written separately for simplicity:
      bool is_color_similar(RGBApixel * c1, RGBApixel * c2, int similarity_threshold)
      {    
            if( abs((int)(c1->Red - c2->Red )) < similarity_threshold &&
                  abs((int)(c1->Green - c2->Green )) < similarity_threshold &&
                  abs((int)(c1->Blue - c2->Blue )) < similarity_threshold )
            {
                  return true;
            }
            return false;
      }


You may notice that this algorithm seems computationally intensive, because we are processing one pixel at a time, and repeating the process for another pixel although we may have already visited it previously. I wrote it out like this so it’s simple to understand. However, you can reduce the running time to O(n), where n is the number of pixels in the captcha, by having another master 2D array representing whether you’ve visited a pixel or not. Process an entire ‘blob’ (noise, or text character) at a time, remove or keep all of the pixels, and mark it in the master visited array.

Here is the result.

 


[3] Now we have an image with just some text, much easier to read. The final portion would be to segment this image into individual characters. The idea is that we will iterate along the horizontal center pixels of the captcha, and for each pixel, we will draw a vertical line. Along with this, we will draw all the possible lines that are from -45 degrees to 45 degrees to this vertical line, that also intersect the current pixel. We will compute the number of non white pixels that these lines intersect. The lower the number, the more likely we are in the whitespace between characters. In the end we can just find the four thickest of these low count lines, and we will end up properly segmenting the captcha.

 
[Counting the number of nonwhite pixels that intersect lines within the grey area can give a good idea of where whitespace is. In this captcha, the left x coordinate will have a relatively low intersection for a range of angles whereas the right x coordinate will always have a nonzero, and relatively high number of intersections.]

Let’s create a 2D, 1 bit map array first. Essentially this is a black and white representation of the captcha in the current state.
      // Black and white bit map. Generally, bool bit_map[50][175];
      bool** bit_map;
      bit_map = new bool*[height];
      for (int i = 0; i < height; i++) {
            bit_map[i] = new bool[width];
      }

Since we want from 45 degrees to 135 degrees,
      int angle_iteration = 91;

Here, an array to keep track of our lines. Initialize all values to 0:
      // Parameterized array which will hold the number of intersections on lines along the width of the captcha.
      // Generally, int segement_arr[91][175];
      int** segment_arr;
      segment_arr = new int*[angle_iteration];
      for (int i = 0; i < angle_iteration; i++) {
            segment_arr[i] = new int[width];
      }

     

Here, the computation:
      int adjusted_x, adjusted_y;
      const double PI = 3.1415926535897932384346;
      // The algorithm to caclulate the number of intersections. Data is added to segment_arr[][].
      for (int i = 0; i < width; i++) {
            for (int k = 0; k < angle_iteration; k++) {
                  for (int j = 0; j < height; j++) {
                        adjusted_y = (height - j) - height/2;
                        adjusted_x = i + (double)adjusted_y / tan( PI/180.0 * double(45.0 + k));
                        if (adjusted_x > 0 && adjusted_x < width) {
                              if (bit_map[j][adjusted_x] == 1) {
                                    segment_arr[k][i]++;
                              }
                        }
                  }
            }
      }

All the information necessary to segment the captcha is stored in this computed array,
segment_arr[i][k]. Moving vertically (index i) along this array means we’re looking at a different angled line, and moving horizontally (index k) means we’re moving along the width of the captcha. Whitespace between any given characters will most likely be a few pixels wide, and this whitespace will allow for a few different angled lines to be drawn through it. This means that if we can find a large enough continuous, similarly valued area within this 2D array, then we’re in whitespace along the center of the cpatcha. To do this, we will just iterate through this array, add the current index to a stack, process all indices around the current index, and if they’re similar we will add them to the stack as well. In the end we will have multiple stacks, and the ones that are largest most likely represent the correct whitespace. I will represent this visually, take a look at how the whitespace in the captcha image translates into the parameterized array.


[Green represents a low pixel intersection, and yellow represents high intersection. Note that 4 of the large individual green areas match correctly with the whitespace between the characters.]

There are many ways to extract the correct x coordinates from this 2D array,
int** segment_arr. The base logic for it is simple, to look around one index until we find a similarly valued index. We would want to find the largest of these similarly valued areas and that’s where we should segment. The following function will demonstrate how to gather a set of similarly valued coordinates, from a starting coordinate coord current_x_y.

      vector<coord> segment_aux(int ** segment_arr, int angle_iteration, int width, coord current_x_y) {

            // Keep track of what we’ve processed. Initialize all values to 0.
            int** visited;
            visited = new int*[angle_iteration];
            for (int i = 0; i < angle_iteration; i++) {
                  visited[i] = new int[width];
            }

            vector<coord> all_coords;

            stack<coord> temp;

            temp.push(current_x_y);
            all_coords.push_back(current_x_y);

            while (!temp.empty()) {
                  coord b;
                  coord a = temp.top();
                  temp.pop();

                  if (a.x - 1 >= 0) {
                        b.x = a.x - 1;
                        b.y = a.y;
                        if (segment_arr[current_x_y.y][current_x_y.x] >= segment_arr[b.y][b.x] && visited[b.y][b.x] == 0) {
                              temp.push(b);
                              all_coords.push_back(b);
                              visited[b.y][b.x] = 1;
                        }
                  }
                  if (a.x - 1 >= 0 && a.y - 1 >= 0) {
                        b.x = a.x - 1;
                        b.y = a.y - 1;
                        if (segment_arr[current_x_y.y][current_x_y.x] >= segment_arr[b.y][b.x] && visited[b.y][b.x] == 0) {
                              temp.push(b);
                              all_coords.push_back(b);
                              visited[b.y][b.x] = 1;
                        }
                  }
                  if (a.x - 1 >= 0 && a.y + 1 <  angle_iteration) {
                        b.x = a.x - 1;
                        b.y = a.y + 1;
                        if (segment_arr[current_x_y.y][current_x_y.x] >= segment_arr[b.y][b.x] && visited[b.y][b.x] == 0) {
                              temp.push(b);
                              all_coords.push_back(b);
                              visited[b.y][b.x] = 1;
                        }
                  }
                  // Five more areas to look around the current coordinate, not shown completely here. Same pattern as above:
                  // if (a.y - 1 >= 0) {
                  // if (a.x + 1 < width) {
                  // if (a.x + 1 < width && a.y - 1 >= 0) {
                  // if (a.x + 1 < width && a.y + 1 <  angle_iteration) {
                  // if (a.y + 1 <  angle_iteration) {

            for (int i = 0; i < angle_iteration; i++) {
                  delete[] visited[i];
            }

            return all_coords;

      }



In my implementation, I have some additional layers of logic that I will explain here. There are two things we should do to very accurately calculate the proper segmentation locations. The first is to look at a smaller subsection of
segment_arr, because if you notice in the captchas, letters are slanted in roughly the same direction and it will be rare to have one segment at -45 degrees and the one right next to it at +45 degrees. The second is that we should use a weighting system such that we can select the four x coordinates, from the many choices there may be, that have the highest weight.

So let’s look at only half of the angle range at a time.
      int new_angle_iteration = angle_iteration/2;

And we will create the smaller sub array, just a smaller version of
segment_arr.
      // A sub section of the computed array. Initialize all values to 0.
      int** segment_arr_sub;
      segment_arr_sub = new int*[new_angle_iteration];
      for (int i = 0; i < new_angle_iteration; i++) {
            segment_arr_sub[i] = new int[width];
      }


We need is to keep track of indices we’ve visited so we can make sure there’s no overlapping or revisiting of old data:
      // Keep track of visited indices. Again, initialize all values to 0.
      int** visited_sub;
      visited_sub = new int*[new_angle_iteration];
      for (int i = 0; i < new_angle_iteration; i++) {
            visited_sub[i] = new int[width];
      }

We’re going to need some data structures to help with processing.
      bool bool_overlap_sub = false; // Boolean that will help control iterations and addition of elements to segment_data_stack_vector_sub

      coord temp_coord_2_sub; // Temporary coord used in processing.
      vector<coord> stack1_sub, stack_2_temp_sub; // Two vectors that will hold a similar region within segment_arr.
      vector<vector<coord>> segment_data_stack_vector_sub; // This vector will hold all of the low count whitespace areas, basically stack1_sub.


Here’s how you would use values from the original large array to fill in the smaller sub arrays:
      // K represents the offset between two sub arrays. The lower the number, the more time it will require to process.
      for (int k = 0; k + 15 < angle_iteration - new_angle_iteration; k+=15) {
           
            // Fill the sub array with partial data from the original, segment_arr.
            for (int i = 0; i < new_angle_iteration; i++) {
                  for (int j = 0; j < width; j++) {
                        segment_arr_sub[i][j] = segment_arr[i + k][j];
                  }
            }

Within the loop, we need to first process the two whitespaces at the edge of the captcha image object and mark them as visited. Here’s the left side, and you would need to do the same for the right side:
            // Let's determine where and how large the whitespaces are at the left and right sides.
            temp_coord_2_sub.y = 0, temp_coord_2_sub.x = 0;
            stack1_sub = segment_aux(segment_arr_sub, new_angle_iteration, width, temp_coord_2_sub);
            while(! stack1_sub.empty()) {
                  temp_coord_2_sub = stack1_sub.back();
                  visited_sub[temp_coord_2_sub.y][temp_coord_2_sub.x] = 1;
                  stack1_sub.pop_back();
            }

            // Now the right side. (Not shown). Same as above, with: temp_coord_2_sub.y = new_angle_iteration - 1, temp_coord_2_sub.x = width - 1;

            // Now process the remainder of the whitespaces. Segmentation will occur within these.
            for (int i2 = 0; i2 < new_angle_iteration; i2++) {
                  for (int j2 = 0; j2 < width; j2++) {
                        // Does the number of intersections fall within the threshold
                        if (segment_arr_sub[i2][j2] <= min_intersection_threshold) {

                              // And it has not been previously processed
                              if (visited_sub[i2][j2] == 0) {

                                    bool_overlap_sub = false;

                                    temp_coord_2_sub.y = i2;
                                    temp_coord_2_sub.x = j2;
                                    stack1_sub = segment_aux(segment_arr_sub, new_angle_iteration, width, temp_coord_2_sub);
                             
                                    stack_2_temp_sub = stack1_sub;

                                    while(! stack_2_temp_sub.empty() ) {
                                          temp_coord_2_sub = stack_2_temp_sub.back();
                                          if (visited_sub[temp_coord_2_sub.y][temp_coord_2_sub.x] == 1) {
                                                bool_overlap_sub = true;
                                          }
                                          stack_2_temp_sub.pop_back();                               
                                    }

                                    if (! bool_overlap_sub) {
                                          segment_data_stack_vector_sub.push_back(stack1_sub);
                                          while(! stack1_sub.empty()) {
                                                temp_coord_2_sub = stack1_sub.back();
                                                visited_sub[temp_coord_2_sub.y][temp_coord_2_sub.x] = 1;
                                                stack1_sub.pop_back();
                                          }
                                    }
                              }

                        }
                       
                  }
            }

Once we’ve added all of the low count regions to the vector
segment_data_stack_vector_sub we should process the elements by going through them and applying a weight to the possible different sets. I have the algorithm below in its entirety, commented so it is easy to follow. The basic idea is to look at all the ordered permutations of whitespace regions and apply a weight that is higher for the amount of whitespace and higher for the distance between each segment. Add a penalty for deviation from the average for both of these. You can see below for the actual formula. Also, the permutations are fine in this case since we won’t be looking at too many at once.

            if(segment_data_stack_vector_sub.size() >= 4) {
                  // Here we order the elements in the vector so that they are sequental from the left to the right of the captcha.
                  sort(segment_data_stack_vector_sub.begin(), segment_data_stack_vector_sub.end(), segment_sort_x_midpoints);
                 
                  // Calculate the possible permutations of the segments here.
                  // (declared above) vector<int> permutation_calc;
                  for (int p1 = 0; p1 < segment_data_stack_vector_sub.size(); p1++) {
                        permutation_calc.push_back(p1);                      
                  }

                  vector<vector<int>> computed;
                 
                  do {
                        // We check if the current permutation is in the correct sequential order
                        bool out_of_order = false;
                        for (int p1 = 0; p1 < 3; p1++) {
                              // The order is not sequential, break out of this iteration without further calculations
                              if ( permutation_calc.at(p1 + 1) <=  permutation_calc.at(p1) ) {
                                    out_of_order = true;
                                    break;                                   
                              }
                        }

                        // Add this permutation to a vector, so we can know we've computed the weight for this permutation
                        computed.push_back(permutation_calc);

                        // Check if we've already computed the weight for this permutation
                        if ( computed.size() >= 2 ) {
                              vector<int> a, b;
                              bool different = false;
                              a = *(computed.end()-1), b = *(computed.end()-2);
                              for (int i = 0; i < 4; i++) {
                                    if (a.at(i) != b.at(i)) {
                                          different = true;
                                    }
                              }
                             
                              if (different == false)
                                    out_of_order = true;
                        }
                       

                        if(out_of_order == false) {
                 
                              // The order is sequential, continue processing

                              // First get the sizes of the whitespace.
                              for (int p1 = 0; p1 < 4; p1++) {
                                    // (declared above) int whitespace = 0;
                                    // (declared above) vector<coord> perm_temp;

                                    // Get the first set of coordinates that represents the first possible segmentation area.
                                    perm_temp = segment_data_stack_vector_sub.at(permutation_calc.at(p1));
                                    // Add its size to the whitespace couter.
                                    whitespace_total +=  perm_temp.size();
                              }

                              // Find how much the whitespaces deviate from the average, and add a penalty for excessive deviation.
                              // (declared above) int whitespace_avg;
                              whitespace_avg = whitespace_total/4;
                              for (int p1 = 0; p1 < 4; p1++) {
                                    // (declared above) double whitespace_deviation = 0;
                                    // (declared above) vector<coord> perm_temp;

                                    // Get the first set of coordinates that represents the first possible segmentation area.
                                    perm_temp = segment_data_stack_vector_sub.at(permutation_calc.at(p1));
                                    // Find how much it differs from the average value, and add that to whitespace_deviation.
                                    whitespace_deviation += (double)abs((double)perm_temp.size() - (double)whitespace_avg);
                              }
                              // (declared above) double whitespace_deviation_avg;
                              whitespace_deviation_avg = whitespace_deviation/4.0;

                              // Calculate the distances between the midpoints of the whitespaces and compare with the average distance.
                              for (int p1 = 0; p1 < 4; p1++) {

                                    // This vector will keep track of the x location of the whitespaces.
                                    // (declared above) vector<int> whitespace_position_counter;

                                    // (declared above) double whitespace_deviation = 0;
                                    // (declared above) vector<coord> perm_temp;

                                    // Get the first set of coordinates that represents the first possible segmentation area.
                                    perm_temp = segment_data_stack_vector_sub.at(permutation_calc.at(p1));
                                    // Get its midpoint
                                    int min_x = perm_temp.at(0).x, max_x = perm_temp.at(0).x, mid_x;
                                    for (int p2 = 0; p2 < perm_temp.size(); p2++) {
                                          if (perm_temp.at(p2).x < min_x)
                                                min_x = perm_temp.at(p2).x;
                                          if (perm_temp.at(p2).x > max_x)
                                                max_x = perm_temp.at(p2).x;
                                    }
                                    mid_x = min_x + (max_x - min_x)/2;

                                    whitespace_position_counter.insert(whitespace_position_counter.begin() + p1 + 1, mid_x);

                              }

                              // Compute avg distance between the midpoints
                              for (int p1 = 0; p1 < 5; p1++) {
                                    whitespace_position_values.push_back(whitespace_position_counter.at(p1 + 1) - whitespace_position_counter.at(p1));
                                    whitespace_position_sum += (whitespace_position_counter.at(p1 + 1) - whitespace_position_counter.at(p1));
                              }
                              // Now the average distance between the midpoints.
                              whitespace_position_avg = whitespace_position_sum/5.0;
                             
                              // Now the sum of the deviations of the midpoints and their average values.
                              for (int p1 = 0; p1 < 5; p1++) {
                                    whitespace_position_deviation += abs(whitespace_position_values.at(p1) - whitespace_position_avg);
                              }
                             
                              whitespace_position_deviation_avg = whitespace_position_deviation/5.0;

                              weight = whitespace_total - ((whitespace_deviation_avg/whitespace_avg)*whitespace_total) - ((whitespace_position_deviation_avg/whitespace_position_avg)*whitespace_total) + whitespace_position_sum - ((whitespace_position_deviation_avg/whitespace_position_avg)*whitespace_position_sum);

                              // Here again we clear variables to a start state for the next iteration.
                              // This one is not cleared completely because the first and last elements represent the whitesapce at the left and right
                              // corners. They remain the same although the middle regions can change.
                              whitespace_position_counter.erase(whitespace_position_counter.begin()+1, whitespace_position_counter.end()-1);                                         whitespace_position_values.clear();
                              whitespace_total = 0;
                              whitespace_avg = 0;
                              whitespace_deviation = 0;
                              whitespace_deviation_avg = 0;

                              whitespace_position_sum = 0;
                              whitespace_position_avg = 0;
                              whitespace_position_deviation = 0;
                              whitespace_position_deviation_avg = 0;

                              // Here we will do a check to make sure that the current segmentation does not leave any sections that are empty.
                              // If any segment is empty, it cannot be correct so we will not add it to the vector of weights.
                              if (test_all_segments_non_null(k, segment_arr_sub, segment_data_stack_vector_sub, width, height, bit_map, angle_iteration) ) {
                                    // Add the weights to a vector. We will use the highest value to do the final segmentation.
                                    all_weights.push_back(weight);
                              }
                             
                             
                        }

                  }
                  while( next_permutation(permutation_calc.begin(), permutation_calc.end()) );

                  // Add the maximum weight corresponding to the current offset for the sub array (K value) to a vector.
                  // In the end, the highest value of all K's will be used for final segmentation.
                  if(!all_weights.empty()) {
                        sort(all_weights.begin(), all_weights.end()); // Sorted for the current K offset, with lowest value first, highest at the end.
                        of_wt_temp.offset = k, of_wt_temp.weight = all_weights.back(); // Choose the current highest value, and add it here.
                        subarray_weights.push_back(of_wt_temp);
                  }
            }

      }


Because we need to keep track of all the weights, we will use the following structures.
      // We need to keep track of which subsection, and it's weight.
      struct offset_and_weight {
            int offset;
            double weight;
      };
      offset_and_weight of_wt_temp; // Temporary.
      vector<offset_and_weight> subarray_weights; //In the end we will use the highest weight for best segmentation.

To compute the
min_intersection_threshold just look at the distribution of intersections along the width of the captcha and find the average of the lowest 10 percent of the values. The range should generally be 0 to 3. Sometimes we may come across captchas which have letters more attached than expected. To take care of this problem, you should have a counter on how many possible permutations you have computed, and if the value is too low, say 4 generally, then you should increment this threshold by a small amount and try again. This can be done with a simple do while loop.
      // This holds and orders all of the intersections for all widths and angles for statistical analysis.
      // vector<int> intersections_ordered;

      vector<int>::iterator at_10 = intersections_ordered.begin() + double(intersections_ordered.size())*0.10;
      int sum_00_to_10 = 0;
      while(at_10 != intersections_ordered.begin()) {
            sum_00_to_10 += *at_10;
            at_10--;
      }
     
      int min_intersection_threshold = sum_00_to_10/(intersections_ordered.size()*0.10);


And below I have the comparator algorithm used to sort the regions sequentially:

// We find the midpoint of the x range of these values. Midpoints will then be ordered sequentially.
bool segment_sort_x_midpoints(vector<coord> i, vector<coord> j)
{

      int min_i_x = (i.at(0)).x, max_i_x = (i.at(0)).x;
      int min_j_x = (j.at(0)).x, max_j_x = (j.at(0)).x;

      for(int k = 0; k < i.size(); k++) {
            if( (i.at(k)).x < min_i_x )
                  min_i_x = (i.at(k)).x;
            if( (i.at(k)).x > max_i_x )
                  max_i_x = (i.at(k)).x;
      }

      for(int k = 0; k < j.size(); k++) {
            if( (j.at(k)).x < min_j_x )
                  min_j_x = (j.at(k)).x;
            if( (j.at(k)).x > max_j_x )
                  max_j_x = (j.at(k)).x;
      }
     
      return (min_i_x + (max_i_x - min_i_x)/2) < (min_j_x + (max_j_x - min_j_x)/2);
}

In the end you will have all of the weights for ways in which the captcha can be segmented, along with the best offset, K. Rerun the weighting algorithm with the appropriate offset and once you reach the weight, segment the captcha. Here is the result.


[The (correctly) computed segmentation x coordinates and angles are: 73 < 75 95 < 102 114 < 107 131 < 119]

The results for the other captchas are given. These samples were selected at random and we can see that the success rate is fairly high, 83 percent.
segment1 segment1 segment1 segment1  29 segment1


Conclusion: As of September 2011, this captcha is still in use. However, all variations of this particular type of captcha are vulnerable to processing by this method. Recommendations would be to use a varying number of characters, minimize the amount of whitespace between characters, and reduce the contrast difference between text, background, and noise. Though it would make it more difficult, it cannot be made impossible for a computer to process it automatically.



Original ideas by Prabhu, published 2011 at p.rabhu.com.