Researchers tested three popular artificial intelligence systems to see if they could accurately guess how much food and calories are in photos of meals. They compared ChatGPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro by showing them pictures of 52 different foods and meals in different portion sizes. Two of the AI systems (ChatGPT and Claude) did a decent job, getting within about 36-37% accuracy for estimating food weight and calories—similar to how well people estimate their own meals. However, Gemini performed much worse. All three systems tended to underestimate how much food was on the plate, especially for larger portions. The findings suggest these AI tools could help people track what they eat without the hassle of writing everything down, but they’re not accurate enough yet for athletes or patients who need precise measurements.

The Quick Take

  • What they studied: Can three different AI systems accurately estimate how much food and calories are in photos of meals?
  • Who participated: The study didn’t involve people eating food. Instead, researchers took 52 standardized photographs of individual food items and complete meals in three different portion sizes (small, medium, and large), then had AI systems analyze them.
  • Key finding: ChatGPT and Claude were roughly 36-37% off when guessing food weight and 35.8% off for calories—about as accurate as people estimating their own meals. Gemini was much less accurate, with errors ranging from 64-110%. All three systems consistently underestimated portion sizes, especially for larger meals.
  • What it means for you: If you’re casually tracking what you eat, an AI photo app might help you remember meals without writing everything down. However, if you need precise calorie or nutrient counts for medical reasons or athletic training, these tools aren’t reliable enough yet. The AI tends to tell you there’s less food than there actually is.

The Research Details

Researchers created a controlled test using 52 carefully prepared food photographs. They included 16 photos of individual food items (like a piece of chicken or a bowl of rice) and 36 photos of complete meals. Each meal was photographed in three different portion sizes: small, medium, and large. To help the AI systems estimate portion size, the researchers included common reference objects like forks, knives, and plates in the photos.

They then showed these identical photos to three different AI systems—ChatGPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro—and asked each one the same questions about what foods were in the picture and how much of each food was there. The AI systems were instructed to use the visible cutlery and plates as size references to estimate portions.

After the AI systems made their estimates, researchers compared those estimates to the actual nutritional information. They knew the true weight of each food and its actual calorie and nutrient content from direct weighing and a nutrition database called Dietist NET. This allowed them to calculate exactly how far off each AI system was.

This research approach is important because it tests AI systems in a controlled way. By using the same photos for all three systems and comparing their answers to known correct values, researchers can fairly judge which AI is better at this task. This matters because many people now use their phones to track food, and if AI could do this automatically from photos, it would be much easier than writing down everything you eat.

This study has several strengths: it used standardized photos so all AI systems were tested fairly, it compared results to actual measured values rather than guesses, and it tested multiple AI systems. However, the study only looked at 52 photos, which is a relatively small number. The photos were also taken in a controlled lab setting with professional lighting and reference objects, which is different from real-world photos people take with their phones. Additionally, the study didn’t test how these systems perform with different types of cuisines, plating styles, or real smartphone photos, which limits how well these results apply to everyday use.

What the Results Show

ChatGPT and Claude performed similarly to each other, with ChatGPT being slightly more accurate. When estimating how much food was on the plate (weight), ChatGPT was off by an average of 36.3% and Claude by 37.3%. For calories, ChatGPT was off by 35.8%. These error rates are actually comparable to how accurate people are when they estimate their own food intake from memory.

Gemini performed significantly worse than the other two systems. It was off by 64.2% to 109.9% depending on what nutrient was being measured. This means Gemini’s estimates were often wildly inaccurate.

When researchers looked at how well the AI estimates matched the true values, ChatGPT and Claude showed moderate to good correlation (0.65 to 0.81 on a scale where 1.0 is perfect). Gemini’s correlations were weaker (0.58 to 0.73), meaning its estimates didn’t track as closely with reality.

All three AI systems made the same type of mistake: they consistently underestimated how much food was present. This problem got worse as portion sizes got larger. In other words, when shown a large meal, the AI systems were more likely to guess there was less food than there actually was.

The study found that estimating macronutrients (proteins, carbohydrates, and fats) was harder for all systems than estimating total calories. The AI systems also had more difficulty with mixed dishes (like casseroles or stir-fries) compared to simple foods like a piece of fruit or a single protein. The systematic underestimation increased with portion size, suggesting the AI systems have particular difficulty judging very large amounts of food.

This study is one of the first to systematically test general-purpose AI systems for food analysis. Previous research on dietary assessment has shown that people typically underestimate their food intake by 10-30% when recalling what they ate. The fact that ChatGPT and Claude perform similarly to human recall (with 35-37% errors) is interesting because it suggests AI might be no better or worse than asking people to remember their meals. However, the advantage of AI is that it doesn’t require people to remember—they just take a photo.

Several important limitations should be considered: First, the study only tested 52 photos, which is a small sample. Second, these were professional photographs taken in controlled conditions with perfect lighting and reference objects, which is very different from photos people actually take with their phones. Third, the study didn’t test how the systems perform with different types of food, different cuisines, or different plating styles. Fourth, the AI systems were given written instructions but weren’t specifically trained or fine-tuned for food analysis—they’re general-purpose systems. Finally, the study didn’t test how these systems perform in real-world conditions where people might take blurry photos, photos from odd angles, or photos with poor lighting.

The Bottom Line

For casual food tracking: ChatGPT or Claude-based apps may be helpful as a reminder tool to log meals without writing everything down, though you should expect the estimates to be off by roughly one-third. For precise dietary assessment: These AI systems are not yet reliable enough for clinical nutrition assessment, medical weight management, or athletic performance optimization where accuracy is critical. If you need precise calorie or nutrient counts, traditional methods like food scales or consultation with a registered dietitian remain more reliable. Confidence level: Moderate for casual use; Low for clinical or athletic applications.

This research matters for: people interested in casual food tracking who want an easier method than writing everything down; app developers creating nutrition tracking tools; healthcare providers considering AI-based dietary assessment tools; and athletes or people with medical conditions who need accurate nutritional information. This research should NOT be used by: people with eating disorders (who need precise monitoring), athletes competing at high levels, people with diabetes or other conditions requiring strict calorie or nutrient control, or anyone whose health depends on accurate dietary assessment.

If you used an AI photo app for casual tracking, you might notice patterns in your eating habits within 1-2 weeks. However, don’t expect precise calorie counts—think of it more as a rough guide. If you’re trying to lose weight or manage a medical condition, you’d need more accurate tracking methods to see reliable results.

Want to Apply This Research?

  • Track meal photos and AI-estimated calories daily, but also manually weigh portions once per week to calibrate your understanding of how much the AI tends to underestimate. Create a personal correction factor (for example, ‘AI says 500 calories, but I usually add 150-200 calories based on actual weight’).
  • Use the app to take photos of meals before eating as a mindfulness tool and meal reminder, rather than relying solely on the calorie estimates. This creates a visual food diary that helps you notice eating patterns without the burden of manual entry. Pair this with occasional manual weighing of common foods to build awareness of portion sizes.
  • Over 4 weeks, track: (1) number of meals photographed daily, (2) time of day eating patterns emerge, (3) foods the AI struggles with most, and (4) your subjective sense of fullness versus AI calorie estimates. Use this data to identify which AI estimates you can trust and which need adjustment. Consider this a supplementary tool rather than your primary tracking method.

This research evaluates AI systems for food analysis in controlled laboratory conditions and should not be used as a substitute for professional medical or nutritional advice. These AI tools are not yet approved for clinical dietary assessment, medical weight management, or treatment of eating disorders. If you have diabetes, heart disease, an eating disorder, or any medical condition requiring precise nutritional monitoring, consult a registered dietitian or healthcare provider rather than relying on AI estimates. The accuracy limitations identified in this study (35-37% error rates) mean these tools are suitable only for casual tracking and general awareness, not for precise calorie or nutrient quantification. Always verify AI estimates with actual food weighing if accuracy is important for your health goals.