Which AI model should I use for EHS work now? For what tasks? (July 2025 Edition)
How to build a multi-model AI strategy that transforms safety operations while controlling costs
The question EHS leaders asked in 2024 was simple: "Which AI should we use?"
The answer in 2025 is more sophisticated: "Which AI should we use for this specific safety task?"
Of course we wouldn't use a single tool for every job in your safety program, so the era of picking one "winner" AI model is over. We're now in the multi-model era, and the data proves it's working.
The Enterprise Shift: Why Multi-Model Matters
According to a comprehensive survey of 100 enterprise CIOs by Andreessen Horowitz, 37% of enterprises now use five or more AI models in production, up from 29% just a year ago. They're not doing this because they have unlimited funds…… they've discovered that different models excel at dramatically different tasks.
For EHS professionals, this shift unlocks two critical advantages:
Accuracy where it counts most. A misclassification of a confined-space hazard carries far higher stakes than spending extra on the right model.
Budget efficiency. Cheaper models can handle bulk safety log processing, while you reserve premium capabilities for critical reasoning tasks.
The numbers back this up: According to a16z, “Enterprise AI budgets grew beyond already high forecasts and graduated from pilot programs and innovation funds to recurring line-items” in core IT and business unit budgets. Organizations are much more sophisticated at mixing and matching multiple models to optimize across both performance and cost.
Know Your AI "Positions" for Safety Work
Think of your AI toolkit like specialized safety equipment. Here's how to match models to EHS tasks:
Fast Chat Models (GPT-4o, Claude Sonnet, Gemini Flash)
Best for: Brainstorming, simple text cleanup, quick safety communications Think: Your everyday safety coordinator—fast, reliable for routine tasks
Power Reasoning Models (OpenAI o3, Claude Opus, Gemini Pro 2.5)
Best for: Root-cause analysis, complex procedure synthesis, regulatory interpretation Think: Your most experienced safety engineer—slower but incredibly thorough
Code Specialists (Claude for technical work, GPT-4.5)
Best for: Automating safety dashboards, building risk assessment tools Think: Your safety data analyst—excellent at turning information into actionable systems
Cost-Sensitive Models (Llama 3, Mistral, open-source options)
Best for: High-volume log parsing, on-premises privacy requirements Think: Your reliable administrative support—handles the heavy lifting at low cost
Mapping AI to Your Hazard Analysis Workflow
Here's how I suggest EHS teams deploy different models across their safety operations:
Pro tip: Let models "argue." Feed Gemini 2.5 Pro's mitigation plan to GPT o3 and ask it to critique, then loop the rebuttal back to Gemini—cross-model debate surfaces hidden flaws.
The Three-Tier Approach: Picking the Right Model
Most systems default to the fast model to save computing power, so you need to manually switch using the model selector dropdown. For anything high stakes (analysis, writing, research, coding) usually switch to the powerful model.
Tier 1: Fast Models for Daily Operations
Safety toolbox talk generation
Basic incident report summaries
Quick policy clarifications
Routine email responses
Tier 2: Power Models for Critical Analysis
Comprehensive accident investigations
Regulatory compliance assessments
Complex risk evaluations
Safety program audits
Tier 3: Specialized Models for Specific Tasks
Technical safety system design (Claude for coding)
Visual hazard identification (Gemini for image analysis)
Cost-sensitive bulk processing (open-source models)
I created this interactive AI Model Selection Guide/Decision Tree to help you decide which model to choose.
Deep Research: Your New Competitive Advantage
One of the most powerful but underutilized features is Deep Research. Deep Research tools are very useful because they can produce very high-quality reports that often impress information professionals (lawyers, accountants, consultants, market researchers).
EHS-Specific Deep Research Applications:
Regulatory landscape analysis: "Research all OSHA changes in the past 12 months affecting manufacturing facilities with more than 100 employees"
Best practice benchmarking: "Analyze industry-leading confined space programs in chemical manufacturing, including specific procedures and metrics"
Incident trend analysis: "Research similar incidents in our industry sector over the past 5 years, including root causes and effective mitigation strategies"
Technology evaluation: "Compare available gas detection systems for hydrogen sulfide monitoring in wastewater treatment, including costs and reliability data"
Why One Size Doesn't Fit All in Safety AI
The a16z report makes it clear that the enterprise model layer has "not become commoditized." While many top models seem similar on the surface, they have nuanced strengths and weaknesses, much like specialized safety equipment.
Let's break down how this applies to core EHS functions, using the model archetypes described by both a16z and Wharton's Ethan Mollick.
The Task: Summarizing a dense, 20-page Safety Data Sheet (SDS) into a one-page workplace safety guide.
The Challenge: This requires high accuracy, factual extraction, and the ability to handle long, technical documents.
The Right Model Type: A powerful, high-reasoning model known for its large "context window" and reliability. In his guide, Ethan Mollick points to models like ChatGPT o3 or Gemini 2.5 Pro for serious, high-stakes analysis. These models are the equivalent of your most meticulous, detail-oriented chemical safety expert.
The Wrong Model: A faster, more creative model might miss a critical PPE requirement or misinterpret a specific gravity figure.
The Task: Classifying 500 near-miss reports into categories like "Slips/Trips/Falls," "Ergonomic," or "Machine Guarding."
The Challenge: This is a high-volume, repetitive task that needs to be fast and cost-effective. Extreme analytical depth isn't as crucial as speed and consistency.
The Right Model Type: A fast, efficient, and lower-cost model. Mollick identifies models like GPT-4o or Claude 3 Sonnet as "good for chats" and quick tasks. The a16z report highlights the excellent performance-to-cost ratio of models like Gemini 2.5 Flash, making them perfect for this kind of bulk processing. This is your tireless administrative assistant.
The Wrong Model: Using the most powerful (and expensive) model for this task would be like hiring a Ph.D. toxicologist to do data entry—effective, but a massive waste of resources.
The Task: Brainstorming a list of potential hazards for a new, non-routine maintenance task.
The Challenge: This requires creativity, lateral thinking, and the ability to generate a wide range of possibilities, some of which may be unexpected.
The Right Model Type: A model known for its creative and brainstorming capabilities. While many models can do this, some are tuned to be more generative. You might even use multiple models and compare their outputs to get the most comprehensive list of potential risks.
The Wrong Model: A model that is too literal or conservative might miss less-obvious ergonomic or psychological hazards.
The EHS Multi-Model Playbook in Action
So, how do you implement this without becoming an AI engineer?
Partner with Platforms, Not Just Models: You don't need to build direct integrations with five different AI companies. Modern AI-native EHS platforms or enterprise-wide AI gateways are increasingly doing the "multi-model" work for you. They route your request to the best model for the job behind the scenes. When evaluating a "Buy" solution, ask the vendor: "How do you leverage multiple models? Can I choose the model for specific tasks?"
Think in Tiers: As Ethan Mollick advises, categorize your tasks. Is this a "high-stakes" analytical task (like a regulatory analysis) or a "quick chat" task (like drafting a safety toolbox talk)? Match the model tier to the risk level of the task. Use the powerful, expensive models for the work that absolutely cannot be wrong.
Embrace Specialization: The a16z report notes that some models have clear, market-recognized specialties. For instance, Anthropic's Claude models are often praised for coding and writing. If your EHS team does a lot of technical safety writing or software-related risk assessments, you might lean on a specialized model for those specific use cases.
Cost Control and Avoiding Vendor Lock-In
The enterprise report from a16z reveals important cost considerations. As one enterprise observed, "instead of taking the training data and parameter-efficient fine-tuning, you just dump it into a long context and get almost equivalent results." This move away from fine-tuning also helps companies avoid model lock-in.
Key cost control strategies:
Exploit long context before fine-tuning: Modern models can handle extensive safety documentation without expensive customization
Negotiate usage-based pricing: Avoid outcome-based fees where attribution is unclear
Maintain open-source fallbacks: Keep Llama-3 or Mistral options for privacy-sensitive tasks
Getting Started: The Next Hour
Stop treating AI like Google search. Instead of quick questions with no context, try this approach:
Switch to the powerful model (o3, Claude Opus, or Gemini Pro) for your next complex safety analysis
Upload full context—incident reports, procedures, photos, regulations—give the AI everything it needs
Engage in back-and-forth discussion—ask follow-up questions, request alternatives, challenge assumptions
Try Deep Research on a current safety question where you need comprehensive information
The difference between casual users and power users isn't prompting skill (that comes with experience); it's knowing these features exist and using them on real work.
The Strategic Advantage
Your job isn't to pick the "best" AI model—it's to build a system that leverages the best of all of them. By matching each hazard-analysis step to the model that's fastest, cheapest, or smartest for that slice of work, EHS leaders can boost accuracy, shrink costs, and stay nimble as the model race evolves.
The multi-model era isn't overhead—it's your competitive advantage in building a more effective, efficient safety program.
About the Author: I’m Dan, I am seasoned EHS leader with a passion for bringing AI to the profession. I help EHS leaders navigate AI without feeling overwhelmed. Check out AI4EHS.com where I offer everything from tailored workshops and AI tool selection to advisory support to help safety and operations leaders navigate AI adoption.
Sources
Andreessen Horowitz. "How 100 Enterprise CIOs Are Building and Buying Gen AI in 2025." June 2025.
Mollick, Ethan. "Using AI Right Now: A Quick Guide." June 2025.
Great article. In addition to trying different models for different use cases, keep a clear distinction in your memory of how each works or doesn’t. Be sure to refer to the model when discussing with colleagues. Too often I hear “chatGPT made stuff up” but was that 3.5, 4o, o3? What context did it have? It makes it difficult to assess where people are hitting roadblocks. And Deep Research is something every AI skeptic should play with at least once.