Datics AI - Why have CAPTCHAS gotten so difficult?

Why have CAPTCHAS gotten so difficult?

wp_datics

24 min read

The History of CAPTCHAs

CAPTCHAS stands for Completely Automated Public Turing test to tell Computers and Humans Apart and is a security measure that is used to distinguish between humans and potential spambots. If we go back to the early 2000s, we could see that simple images of text were sufficient to check and block any spambots coming through. However, after nearly two decades, texts have been incredibly obscure and warped in order to stay ahead of and improve optical character recognition programs. This was done after Google bought the program from Carnegie Mellon researchers. 

 

CAPTCHA is an elegant tool for training Artificial Intelligence and it was acknowledged earlier on by its inventors that any test would only be temporary. It was obvious that at the rate scammers, researchers and ordinary people were solving countless puzzles, it was just a matter of time before these machines were going to outrun us. Let’s take an example of Google, back in 2014, who put their machine learning algorithms against the test in front of humans to solve the most distorted text CAPTCHAS. The result was that humans got 33 percent right while the computer got the test right at an astounding 99.8 percent. 

 How They Function

After this Google made the decision to make toward NoCaptcha Recaptcha which operates by observing user behavior and data and then based on that, allows some humans to pass through by clicking the “I’m not a robot” button. However, the machines have since been catching up once again. 

 

A computer science professor at the University of Illinois at Chicago, Jason Polakis, takes personal credit and responsibility for the increase in CAPTCHA difficulty. In 2016, he revealed, by publishing a paper, that he employed the use of off-the-shelf image recognition tools, which included Google’s reverse image search, to solve Google’s CAPTCHAs image with up to 70 percent accuracy. On the other hand, there are many other researchers who have broken Google’s audio CAPTCHA challenges by employing the use of Google’s very own audio recognition programs. 

 

It can now be said that Machine learning is just about as good as humans at basic image, text, and voice recognition tasks according to Pollakis. In fact, algorithms may actually be better at it: “We’re at a point where making it harder for software ends up making it too hard for many people. We need some alternative, but there’s not a concrete plan yet.”

 

Literature present on CAPTCHA is inundated with misleading and unusual attempts at identifying a source other than image recognition or text that humans are naturally good at but machines struggle with. In one instance researchers attempted to ask users to identify images of people based on their ethnicity, expression, and gender. On the other hand, there have been attempts to introduce trivia CAPTCHAs, and CAPTCHAs based on nursery rhymes popular in the area where the targeted user grew up in. These ‘cultural CAPTCHAs’ are not aimed merely at buts but at humans. In other instances, people have tried to prevent image recognition by asking users to identify, horses for example, but making these horses cartoons and giving them sunglasses. One interesting attempt at varying this approach was done by researchers in 2010, where the use of ancient petroglyphs was employed in CAPTCHAs as computers were not very good at breaking down and identifying sketches of an animal that has been scribbled onto cave walls. 

 Evolution of CAPTCHAs

Evolution of CAPTCHAs

More recently there has been an effort to develop and create game-like CAPTCHAs. These tests demand users to rotate objects or move puzzle pieces into place without instructions explicitly being given and rather being implied by the content of the game interface that is presented. It is believed that humans would understand the game’s logic by the interface presented but computers, considering there are no clear instructions, would be left confused and unable to proceed. In other instances, researchers have tried to take advantage of the fact that humans have bodies that are tangible and have tried to use device cameras or augmented reality as an application to submit proof of humanity. 

 

The issue with these types of tests is not necessarily that bots are too smart or equipped to tackle, but rather that humans just are not very good at solving them. The notion is not that humans are inherently incapable of solving these problems but that wide diversity in culture, experience, and language come into place making it difficult to standardize an approach. Once we can get rid of all elements that can make any human pass, we are left with tasks such as image processing which is precisely what an Artificial Intelligence model is going to be good at solving. 

 

According to Polakis, “The tests are limited by human capabilities”. He further goes on to say, “It’s not only our physical capabilities, you need something that [can] cross-cultural, cross-language. You need some type of challenge that works with someone from Greece, someone from Chicago, someone from South Africa, Iran, and Australia at the same time. And it has to be independent of cultural intricacies and differences. You need something that’s easy for an average human, it shouldn’t be bound to a specific subgroup of people, and it should be hard for computers at the same time. That’s very limiting in what you can actually do. And it has to be something that a human can do fast, and isn’t too annoying.”

 

One question must be asked: what is the universal human quality that can be demonstrated to a machine but no machine itself can mimic. In other words, what does it mean to be human? 

 

Perhaps our humanity is not measured by how we perform a task but rather how we move through the world, or in this case the internet. According to Shuman Ghosemajumder, previously employed at Google for combatting click fraud and then gone on to become the chief technology officer of Shape Security, a bot-detection company, whatever sort of CAPTCHA test devise, whether it is game CAPTCHA or video CAPTCHAs, will be broken. He calls for “Continuous authentication” instead of utilizing games and says the focus should essentially be on observing human behavior and analyzing for signs of automation.  Ghosemajumder claims “A real human being doesn’t have very good control over their own motor functions, and so they can’t move the mouse the same way more than once over multiple interactions, even if they try really hard”.  

 

The CAPTCHA team at Google thinks along these very same lines. reCaptcha v3, announced as a release by Google, uses “adaptive risk analysis” to rate traffic depending on how suspicious it may seem; website owners can choose to present a challenging task for sketchy users, like two-factor authentication or a password request. Google would not elaborate on what constitutes going to the score other than it observes what “good traffic” on the page would look like and then use that “good traffic” to detect “bad traffic” according to a product manager on the CAPTCHA team by the name of Cy Khormaee. According to security researchers, the factors are probably factors such as traffic patterns, cookies, browser attributes or a host of other factors. There is a drawback of using the new model of bot detection: it makes web navigation while minimizing surveillance a very annoying experience since factors like anti-tracking extensions and VPNs can get a user flagged as suspicious. 

 What The Future Holds

According to Aaron Malenfant, engineering lead of the CAPTCHA team at Google, moving away from Turing tests was meant to mediate the competition that humans keep losing. He says, “As people put more and more investment into machine learning, those sorts of challenges will have to get harder and harder for humans, and that’s particularly why we launched CAPTCHA V3, to get ahead of that curve”. Malenfant claims that in a time period of five to ten years, CAPTCHA challenges most probably will not be practical to use at all, and instead most of the web will have a constant, hidden Turing test running in the background.

 

Brian Christian, in his book The Most Human Human, says that he entered a Turing test competition only to find that it was actually quite difficult to prove yourself as a human in conversation. Whereas on the other hand, bot makers found it easy to pass, not by holding an intelligent conversation, but by dodging questions by making typos or jokes. There was even a case of a bot winning a Turing competition in 2014 after claiming to be a 13-year-old Ukrainian boy with a poor grasp of English. Ghosemajumder goes on to elaborate, “I think folks are realizing that there is an application for simulating the average human user… or dumb humans.”

 

In 2017, Amazon received the patent for the platform including logic puzzles and optical illusions that humans have had extreme difficulty in breaking down. It seems that with the advancement of technology, and artificial intelligence the CAPTCHAs will only get more difficult to apply effectively in order to distinguish between humans and bots, and a test to find a middle ground will continue to explore.

To check out more tech blogs of the modern age, click here

 

Explore More View All

Why have CAPTCHAS gotten so difficult?

Real-Life Applications of AI.

...

wp_datics

24 min read

Pain-Feeling Robots and AI Models Making Predictions

Real-Life Applications of AI.

...

wp_datics

14 min read

Artificial Intelligence Upending Geopolitics

Geopolitics & AI.

...

wp_datics

18 min read

Get in Touch