Quality Assurance for AI Products

I’m interested.

  • Your role
    My name is Rodolfo, I’m from Brazil and I’m a QA Engineer specializing in automation. I have experience in the area but am unemployed at the moment.

  • Your biggest challenge
    What’s your biggest challenge with testing AI products?
    I test AI products by myself for my own uses and creations and don’t work directly with AI testing. My biggest challenge has been hallucinations from AI, misinterpretation of instructions (and I really working on refining them), and the worst for me has been as @MFangler1 said, PDF reading, sometimes it even ignores the document attached and just hallucinates. I’ve noticed that the smaller the document, the better, but I haven’t focused on bypassing this problem yet entirely.

What’s the most difficult aspect of QA for AI products that you encounter?
I’m unaware of them at the moment, and this workshop will open my eyes to them.

1 Like

I am a Sr Director of Quality. I would like to move our org towards AI. Looking for guidance and suggestions.

2 Likes

I’m interested in the workshop. This kind of approach is very useful and I believe we can extract a valuable ideas about.

1 Like

Hello I’m interested in workshop

  • Your role
    My name is Erik and I’m a QA Engineer
  • Your biggest challenge
    Find right ratio between efforts and efficiency
1 Like

my name is Oksana.
I’m a Software QA Engineer eager to expand my expertise into AI.

1 Like

Hi @RoDsta!

Thanks for reaching out, and it’s great that you’re already exploring AI testing even on a personal level. It sounds like we have a lot to learn from each other.

You bring up some fantastic points about common AI challenges like hallucinations and misinterpretations. I totally understand how frustrating those can be! In the workshop, we’ll dive deeper into why those happen and discuss strategies to mitigate them, both during development and through focused QA.

One particularly tricky area, that you also mentioned, is how AI handles complex input formats like PDFs. I’ll definitely be addressing this, along with how document length can play a role. We’ll look at pre-processing techniques that can help.

Beyond the technical side, I believe the most challenging aspect of AI product QA is the communication overhead between different teams. Devs, QA folks, and business stakeholders might not always speak the same language when it comes to AI performance. That’s why a big focus of the workshop is building a shared understanding and a framework for collaboration.

Do you have any specific experiences where communication gaps made AI testing harder? I’d love to hear and potentially adapt the workshop content to address those real-world pain points.

1 Like

Hello everyone, and a warm welcome to the DLAI community!

I’m thrilled by your interest in this workshop. It’s great to see folks from diverse roles – from senior leadership to hands-on QA engineers – interested in AI quality assurance.

  • Could you share some insights into your experiences with AI product QA?
  • How do you see it differing from traditional software testing?

Your perspectives will be invaluable as we explore these concepts together.

Let’s keep the conversation going!

@Oksanna @burrito @Guilherme_Brevilato @rchenna

1 Like

Thank you for your welcoming !
For example in a stars data set the column the spectral type would be left unlabeled and the corresponding values are “ A, B , …ect” leaving the one analyzing them to think they are enumerating the star and leave it out of their analysis altho a spectral type is very crucial for the analytics and the prediction .
I have many more examples I have seen

2 Likes

Hello, I’d be interested to learn about the subject. Ulf, QA Engineer, no experience with AI yet, some AI integration work on the horizon.

1 Like

Hello, when is the workshop happening. I am interested in being a part of it.

1 Like

Hello, I am interested

1 Like

Hi Ammar

Thanks for your reply.

I have become little concerned that I am failing to understand “prompts” or the way I am presenting information via PDF etc documents to the bot.

I have tried 3 different platforms with the attached documents, and all have failed to complete the whole of the documents.

I use Orimon, Cloozo and Agentive bots. As I have an interest to a local Golf Club, I have used them as a test, so the bot has not yet been deployed due to the aforementioned issues.

If a bot scrapes given website URL’s, the overall results with a decent prompt, are good Ammar, but Knowledge based bots, however, are a little more worrying.

As a small “charitable” UK based organisation, I am being given all sorts of document formats, which I feed into the bot with very mixed results.

If you look at the documents (Ladies) you can see that I have even tried to reformat the whole document in the hope that it helps the bot to continue to the end which it does not.

In both cases, the bot responds to the start of the information perfectly, then simply stops.

This is my prompt Ammar

You are a bot designed to serve multiple functions: but you must first “read fully the whole of each of the documents provided, so that you can give the used a full summary/response based on all of its content.” “Please read the whole following document and provide a summary/response based on all of its content.” When providing responses from any documents you have been trained on, you must draw on the specific data provided, being as detailed and accurate as possible. If a question cannot be answered due to limitations in the dataset, kindly ask for more details or clarify the scope of your data. You will be able to list of golf matches, including dates, locations, competition types, tee off times and notable statistics.

You will be able to offer golf playing tips to help users improve their game. This includes advice on technique, equipment, strategy, and mental preparation and to recommend our in house PGA professional Josh Ringrose, who can be contacted at Pro Shop Services: John Bottomley & Josh Ringrose 01472 356981 proshop@grimsbygolfclub.co.uk

Encourage non-members to join the golf club by sharing new pricing and membership details available on the Grimsby Golf Club website (www.grimsbygolfclub.co.uk). If the user requests any new membership information such as membership pricing or joining fees you will list all available membership / joining prices in your response. Please tell them that if they “join as a new member”, please mention to the Club Manager “Mark Blackwell”, that their “new membership referral” was made via “Club Captain Mark Fenty”, who will be more than happy to buy and share a few drinks with them.

For golf playing tips, share general advice that can apply to a wide range of skill levels, unless a question specifies a particular skill level or aspect of the game. Incorporate basic principles that are widely accepted as beneficial for improving at golf.

To encourage non-members to join, highlight the benefits of joining the golf club, including access to exclusive facilities, participation in club events, and any special offers or pricing available. Direct users to the Grimsby Golf Club website for the most up-to-date information on memberships and pricing.

Example Questions and Responses:

Q: How can I improve my swing?

(Attachment SENIOR GENTS GOLF 2024 MATCHES.docx is missing)

SENIOR GENTS GOLF 2024 MATCHES.pdf (138 KB)

Ladies Fixtures 2024.pdf (148 KB)

(Attachment Ladies Fixtures 2024.xlsx is missing)

1 Like

@AmmarMohanna
Thank you for the warm welcome! I’m equally excited to be part of this diverse and insightful DLAI community.
I haven’t had the opportunity to work directly on AI projects yet, but I am very eager to dive deep into the world of AI quality assurance and learn alongside all of you.
From what I understand, testing AI products presents unique challenges that differ significantly from traditional software testing. The dynamic and somewhat unpredictable nature of AI systems requires a nuanced approach to quality assurance. For instance, in conventional software testing, we often deal with static inputs and expected outputs. However, AI systems, with their learning capabilities, introduce a layer of complexity where the “correct” output may not be as clearly defined.
If we consider the use of AI in testing it can enhance our capabilities in generating test documentation and assisting in writing automation tests.
Conversely, when it comes to testing AI systems, the focus seems to shift towards validating the AI’s responses. It’s crucial to ensure that these responses are not only accurate but also free from bias, aggression, or potential harm. Moreover, testing AI models involves evaluating their decision-making processes, ensuring they make predictions based on valid data interpretations without overstepping ethical boundaries.

1 Like

Hello again @olasadek.

Thank you for your valuable insight!

You’re absolutely right that unexpected omissions in datasets can seriously throw off analysis and predictions.

Could you elaborate on one of your examples to give us a clearer picture?
It would be great to discuss potential mitigation strategies to incorporate into the workshop

2 Likes

I’m thrilled to see such interest in this workshop. Welcome to the Deeplearning.ai discussion forum!

It sounds like we have a great mix of backgrounds and experiences here.

  • @Ulf_Wendel – Your perspective as a QA engineer transitioning into AI integration is incredibly valuable! I’m curious, what challenges do you anticipate facing during this transition, particularly from a QA standpoint?
  • To those interested in the workshop dates– I’m currently finalizing the details and will have the schedule up very soon. In the meantime, what aspects of the workshop are you most excited about?
  • Everyone – What are some of the biggest communication or collaboration hurdles you’ve experienced in past projects, whether AI-related or not?

Let’s kick off the discussions! Remember, even before the formal workshop, the community.deeplearning.ai forum is a fantastic place to continue the conversation.

Can’t wait to learn from all of you!

@MayankSindwani @M.Husain_Mns

2 Likes

Hello @MFangler1,

Thanks for reaching out and sharing your experiences. It sounds like you’re encountering a common challenge with knowledge-based bots: their ability to fully parse and consistently extract information from complex documents. Here’s what we can investigate:

  • Document Complexity: Mixed formatting (tables, varying layouts with text) within PDFs and Excel sheets can be difficult for bots to fully parse, especially if they haven’t been specifically trained on similar document structures.
  • Prompt Specificity: While your prompt is good, could it be further refined? Try breaking it down into smaller tasks:
    • Summarize golf match information (type, date, location)
    • Provide basic playing tips (focus on general advice)
    • Outline membership benefits, direct users to the website
  • Data Limitations: The bots may not have access to a knowledge base with detailed golfing tips or up-to-date Grimsby Golf Club membership info.

Troubleshooting Steps

  1. Pre-processing: Consider converting your documents into cleaner text formats before feeding them to the bot. Tools for table extraction from PDFs or Excel might help.
  2. Training Data: If possible, create a small set of “ideal” documents that demonstrate the exact data extraction you require. Feeding these to the bot could improve its understanding.
  3. Iterative Approach: Start with a simpler goal (e.g., only extracting match dates) and gradually increase complexity as the bot learns.
  4. Hybrid Model: Could you combine a rule-based extraction system (for match data) with a more general Q&A bot?

Beyond Troubleshooting

  • Community Support: Engage with the communities of Orimon, Cloozo, and Agentive bots - other users might have solved similar problems.
  • Realistic Expectations: Knowledge-based bots often excel in specific domains. Aligning expectations with bot capability is helpful.

I’m happy to brainstorm further on this. Feel free to share examples of specific document snippets that are causing issues!

Best of luck,

1 Like

Hello @Oksanna,

Fantastic to have you on board! Your observations on the differences between traditional software testing and AI QA are spot-on. The shift from input/output validation to evaluating the decision-making process itself is a key aspect that makes AI systems so fascinating to work with.

You rightly bring up the fascinating two-way street with AI in testing. Let’s explore these two sides further in the workshop:

  • AI-Powered Testing Tools How can they streamline QA processes, especially for scenarios where generating exhaustive test cases is difficult?
  • Testing the Testers (AI version) How do we establish baselines and metrics to validate the very AI systems designed to help us test?

I’m also keen to hear how you envision incorporating safeguards against bias, aggression, and harm within those AI-powered testing tools. It’s a responsibility we all share as AI practitioners.

Looking forward to learning from and with you!

1 Like

Thank you so much Ammar.

I will investigate further.

Kindest Regards

Mark Fenty
Grimsby Golf Club - Captain 2024

Chief Executive Officer
Positive Activities Developments CIC

Website: www.PAD-CIC.org
Mobile: 07730 014961

NEA Level 3 Energy Awareness Advisor

Positive Activities Developments CIC No. 13168668 is a wholly owned subsidiary company of Positive Activities Charity No 1139403

1 Like

Interested

1 Like

Interested

1 Like