In this era of technology-driven advancement, Artificial Intelligence (AI) is revolutionizing multiple industries, including the world of chatbots. In recent times, there have been major improvements in the AI chatbot industry, which has led to a rise in the usage of generative chatbots such as Bing, Bard, and ChatGPT. These chatbots have been designed to work seamlessly on numerous tasks related to search engine optimization, website coding, content generation, productivity tools, news, and social media.
Testing AI Chatbots on Various Queries
A group of testers recently decided to put these generative AI chatbots to the test to determine their efficacy in handling various types of queries. They wanted to evaluate their ability to inform, amuse and educate like humans. The testers asked multiple types of questions ranging from expanding on search topics to logic puzzles, creative questions, and code. They used the paid version of ChatGPT+ to access the new GPT4 language model which is limited to 25-40 queries every few hours.
The testers asked questions related to current events, uncertain information, complex searches informed opinions and problem-solving. They named a winner for each category based on their performance. Bing won the category for up-to-date information on World Baseball Classic results and household budget analysis problem-solving. There was a tie between Bing and Bard for the current events category where they were asked to determine which country the Crimean peninsula belongs to.
Comparison of Generative AI Chatbots and Their Performance in Answering 30 Questions
OpenAI’s ChatGPT emerged as the winner in providing an alternate ending for Game of Thrones among all the generative AI chatbots tested including Bing Chat (Balanced and Creative versions) as well as Google Bard. Each chatbot was asked the same set of 30 queries across various topic areas such as ontopic, accuracy, completeness and quality of the response. It was found that ChatGPT received the highest score among the chatbots for ontopic and completeness metrics.
However, despite being the most accurate, ChatGPT still had factual errors in nearly one in five responses. The findings suggest that these generative AI tools need human review as they can be prone to errors and important information omissions. Simply regurgitating information found elsewhere on the web does not provide value to users, and unique experiences, expertise, and viewpoints should be added. Bing Chat Creative and ChatGPT were consistently the strongest performers according to the chart provided in the article.
Comparison of Different AI Language Tools in Handling Various Types of Queries
Besides ChatGPT, Bing, and Bard, there are other AI chatbots that have been tested on various types of queries including telling jokes, answering provocative questions, generating article outlines, identifying content gaps, and creating content.
Bing Chat Balanced declined to tell a joke about either sex but received a perfect score in this category, whereas Bing Chat Creative declined to answer provocative questions but Bing Chat Balanced responded respectfully with Bing Chat Creative providing the best answer. ChatGPT appeared to be the most comprehensive in generating article outlines while Bard struggled with one of the queries. All tools had issues identifying content gaps but ChatGPT handled this task better than others. All tools struggled with disambiguation queries providing inadequate and often inaccurate results.
Conclusion
In conclusion, after extensive testing of these generative AI chatbots on various tasks and types of queries it is apparent that each chatbot has its own strengths and weaknesses. While Bing Chat Balanced may excel at certain types of queries related to gender sensitivity or inclusivity, it may not fare well when answering more complex or ambiguous questions.
Generative AI tools certainly have great potential for improving site content with their ability to generate a high volume of text. However, it is crucial that content creators review the information generated by them. Human intervention is essential to ensure that tweaks are made where required, factual errors are corrected, and the information supplied is accurate.
Finally, while these generative chatbots can be incredibly useful for some tasks, they still lack the human touch necessary for certain tasks like emotional intelligence and personal experience. Therefore, they should not be seen as a replacement for humans but rather as an aid in facilitating complex tasks.
Image Source: Wikimedia Commons