© 2026 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).
OPEN ACCESS
E-healthcare product marketplaces often face challenges such as fragmented product information, inconsistent pricing, and limited tools for pricing comparisons. These issues hinder healthcare practitioners and patients from accessing validated product details, making price comparisons, and completing secure transactions. To address these operational gaps, we propose MediMate, a multi-agent AI framework combining rule-based decision-making with Retrieval-Augmented Generation (RAG). The framework features six specialized agents: Validator, Router, General Information, Comparison, Order, and Summarization, integrated with FastAPI, ChromaDB vector persistence, and open-source large language models (LLMs). Rather than using a single model for all queries, tasks are routed to specialized agents, improving efficiency and accuracy. Our system achieved a classification accuracy of 92% on 200 test queries across six intent categories, with an average response latency of 425 ms and a cost of $0.018 per interaction. Ablation studies revealed a 24% performance improvement using the multi-agent architecture over a monolithic model. Simple RAG, compared to Advanced/Graph RAG, achieved a sufficient 92% accuracy in product-related queries without unnecessary complexity. User testing with 50 healthcare professionals and consumers resulted in a satisfaction rating of 4.4/5.0. The system demonstrates scalability under realistic loads, maintaining an SLA compliance rate of 88% with ten concurrent users.
multi-agent systems, Retrieval-Augmented Generation, healthcare e-commerce, intent classification, vector search, conversational AI, product comparison, price analysis
1.1 Problem statement
Healthcare product markets are known to have common characteristics of information diffusion issues. Product information is broken down and scattered across vendor-specific pages; Price structures are heterogeneously represented; Specifications are non-standardized for products. Healthcare professionals and consumers face tangible frictions while trying to:
These functionalities are not combined into one platform that exists at scale, particularly for healthcare; this is fundamentally different from general e-commerce.
1.2 Technical approach
We implement MediMate as a multi-agent system rather than a monolithic conversational AI. This architectural choice-decisively different from the single-large language model (LLM) approaches-emerges from the observed gaps in performance during initial prototyping. The system includes:
Each agent operates independently on its target task, reducing individual computational load while improving per-task accuracy. Integration uses FastAPI for request handling, ChromaDB for vector storage, and llama-3.3-70b-versatile (via Groq infrastructure) for generation tasks.
1.3 Contributions
This study proposes three innovations:
2.1 Literature context
Early healthcare chatbots relied on traditional NLP (n-gram, TF-IDF) with limited domain understanding. AIML-based systems introduced structured conversation but suffered from rigid templates. Recent work has explored domain-specific RAG for clinical applications [1-4] and multi-agent architectures for medical report generation [5-7].
The vertical AI agent literature establishes that industry-specific, task-specialized agents achieve superior performance compared to generalist models on complex workflows. Multi-agent healthcare systems with EHR integration demonstrate promise for personalized interactions, while transformer-based models enhance contextual understanding in medical conversations. Recent surveys provide comprehensive comparison of RAG architectures (Simple, Advanced, Modular, Agentic, Graph RAG), directly informing our architecture selection [8-15].
MediMate differs from prior work in three ways:
2.2 Why multi-agent over monolithic?
During initial development, we tested a single LLM handling all query types (general information, comparisons, orders, summarization). The system functioned but exhibited degraded accuracy across diverse task types. This is an empirical observation that fits with the studies [16-18], which provide evidence that agents that specialize have better performance results in processing diverse data.
Our implementation confirms this prediction made by the theory above. The router agent identifies the intention of a query correctly and assigns them to optimized handlers:
Each agent operates within its domain's optimal complexity. Results section ablation study: removal of the router agent alone results in a 24-point decrease in accuracy from 92% to 68%; this does not seem trainability-friendly.
2.3 Why simple Retrieval-Augmented Generation, not advanced/modular/graph Retrieval-Augmented Generation?
The state-of-the-art architectures in RAG are best at multi-hop reasoning in dense knowledge graphs. For a healthcare product search, as in these studies [19-22], often, users are seeking factual attributes of their query: "What are certifications for this product?" "What are its ingredients?" "How much is its price?", "Advanced RAG" was also tested during the prototype phase:
These questions do not constitute multiple-choice questions; rather, these are fact retrieval questions from various product documents:
For our use case, Simple RAG's simplicity-to-accuracy ratio proves optimal. We're not chasing state-of-the-art on a benchmark; we're optimizing production constraints (latency SLA, infrastructure cost), as shown in Table 1.
Table 1. Model selection
|
Aspect |
270B (Groq) |
8B-Instant (Groq) |
7B (Ollama Local) |
|
Latency (p50) |
425 ms |
580 ms |
720 ms |
|
Cost per 1M tokens |
$0.59 |
$0.02 |
$0 |
|
Context window |
32 K |
8 K |
8 K |
|
Accuracy (QA benchmark) |
92.3% |
84.6% |
79.8% |
2.4 Model selection: Llama-3.3-70B justification
The reason for selecting 70B selected over others is explained through the example given below.
We shortlisted three candidates:
Cost-per-interaction analysis:
Average query requires ~400 tokens (router: 50, agent: 150, response: 200).
• 70B: (400/1M) × $\$$0.59 = $\$$0.000236/token; at 10K queries/day with volume discount = $\$$0.018/interaction
• 8B: (400/1M) × $\$$0.02 = $\$$0.000008/token = $\$$0.003/interaction $\sqrt{ }$ cheaper per token
But 8B misses 500 ms SLA 25% of the time; each SLA miss triggers user retry. At scale, retry loops exceed per-token savings: estimated 15% extra queries needed, adding $\$$0.0045 penalty = $\$$0.0075 true cost.
Moreover, while 70B's 32K context window eliminates summary truncation in RAG retrieval, 8B's 8K window enforces lossy compression of multi-product comparisons.
Outcome: 70B model reduces true cost-per-interaction while meeting the latency SLAs and supporting richer context windows.
3.1 System architecture
MediMate implements a pipeline architecture as shown in Figure 1.
Figure 1. Architecture of agent-based query processing
User Query is directed to the Validator Agent which checks the syntax, security, language after that it is given to a Router Agent where it undergoes an intent classification after which it is handled by Task-Specific Agent (general info, comparison, order, summarization) then to Response Validation and finally to the Formatted Response.
Each component is described below with algorithmic precision.
3.2 Query validator agent
Input: User query string
Output: (valid: bool, error_message: str)
Algorithm:
1. Length check: 5 < len(query) < 2000 characters
• Rationale: <5 chars are noise (single words, emoji); >2000 indicate prompt injection attempts or malformed documents
• Observed filtering: 12 malicious inputs during 2-week evaluation
2. Security pattern matching: Check for SQL injection signatures, code execution patterns is specified in Table 2.
• Patterns: ' OR '1'='1, <script>, exec(), import os
• Observed filtering: 47 queries blocked during testing
• Approach: Not a complete WAF, but catches obvious attacks without false positives
3. Language detection: Identify query language, accept English/Hindi/Spanish with future expansion
• Implementation: langdetect library with confidence threshold 0.95
• Non-English queries logged but not rejected; provide valuable expansion data
Invalid query response:
Rather than silently dropping invalid queries, system triggers clarification prompt: "I didn't quite understand. Could you rephrase your question?"
Table 2. Pattern Recognition with keywords
|
Intent |
Keywords |
Pattern |
Min Confidence |
|
GENERAL_INFO |
"what", "tell me", "info", "details" |
<product_name> + <info_request> |
0.85 |
|
COMPARISON |
"compare", "vs", "difference", "better", "cheaper" |
<product_A> + <comparison_term> + <product_B> |
0.90 |
|
ORDER |
"order", "buy", "purchase", "add to cart" |
<product_name> + <action_verb> |
0.88 |
|
SUMMARIZATION |
"summarize", "brief", "overview", "short version" |
<content> + <brevity_request> |
0.82 |
|
RECOMMENDATION |
"recommend", "suggest", "what should", "best for" |
<condition> + <recommendation_request> |
0.80 |
3.3 Router agent: Intent classification
Input: Validated query (string)
Output: Intent class ∈ {GENERAL_INFO, COMPARISON, RECOMMENDATION, SUMMARIZATION, ORDER, UNKNOWN}
Classification method: Hybrid keyword + rule-based (no trained ML model)
Intent patterns:
Fallback solution: If no pattern has been matched with confidence ≥ 0.80, mark it as UNKNOWN and request that the user provide further explanation.
Reasons for a rule-based approach:
3.4 General information agent: RAG pipeline
Knowledge base schema (ChromaDB):
json
{
"product_id": "MED_001",
"name": "Product Name",
"brand": "Brand",
"category": "Supplements",
"price": 249.99,
"currency": "INR",
"description": "Detailed description...",
"ingredients": ["Ingredient1", "Ingredient2"],
"certifications": ["ISO 9001", "FDA Approval"],
"manufacturer": "Manufacturer",
"usage_instructions": "...",
"side_effects": "...",
"storage_conditions": "...",
"source_url": "authorized_provider.com"
}
RAG pipeline (Simple RAG):
Embedding: Query converted to embedding via Sentence-Transformers (all-MiniLM-L6-v2, 384-dim)
Retrieval: Cosine similarity search in ChromaDB; retrieve top-5 documents
Filtering: Apply similarity threshold ≥0.65
Rationale: 0.75+ threshold missed relevant products due to embedding variance; 0.55 introduced noise; 0.65 optimized precision-recall for 2-week evaluation dataset
Augmentation: Concatenate full text of top-5 documents into LLM prompt
Generation: LLM generates response grounded in retrieved documents, instructed to not extrapolate
Validation: Check that cited product attributes appear in retrieved documents; spot-check 47 responses manually for hallucinations (observed 0)
Similarity metric: "sim"(q,d)=(q⋅d)/(∣∣q∣∣×∣∣d∣∣)
3.5 Comparison agent
Input: Query specifying 2+ products to compare
Output: Comparison matrix + narrative summary
Algorithm:
Entity extraction: Named entity recognition to identify product names/variants
Feature selection: Determine comparison dimensions (price, brand, certifications, key ingredients)
Data extraction per product:
Query ChromaDB for curated data
Web scraping authorized domains (whitelist: 1mg.com, amazon.in, flipkart.com, netmeds.com, pharmeasy.in)
Extract structured data (JSON-LD preferred, CSS selectors fallback)
Validate stock status
Data cleaning:
Remove non-product text
Normalize prices to common currency (INR)
Validate health-related keywords for accuracy
Matrix construction: Create comparison table
Narrative synthesis: LLM generates prose summary with confidence notes for web-scraped vs. curated data
Disclaimer: "This comparison is for informational purposes; consult a healthcare provider for medical advice."
3.6 Order agent: Authentication and authorization
Security workflow:
User submits order (product_id, quantity, delivery_address)
Authenticate: Verify JWT session token, check user role ∈ {PATIENT, HEALTHCARE_PROFESSIONAL, ADMIN}
Authorize: Check if user/provider can transact with specified product
PATIENT: Can order from all AUTHORIZED_PROVIDERS
HEALTHCARE_PROFESSIONAL: Can order from MEDICAL_SUPPLIERS subset only
Provider validation: Query authorized_providers table for active, verified providers
Order creation: Insert order record in MongoDB
Confirmation email: Send to registered email address
Fallback: If not authorized, return list of available authorized providers
Authorization table schema:
text
authorized_providers:
- provider_id (PK)
- provider_name
- api_endpoint
- verification_status (PENDING/VERIFIED/REVOKED)
- last_verification_date
- supported_product_categories
3.7 Summarization agent
Input: Long-form text or retrieved documents
Output: Concise summary (bullet points, paragraph, or comparison format)
Algorithm:
Scope identification: Extract key terms (brand, product type, certifications), identify critical details
Extractive summarization: TF-IDF scoring to identify top sentences (select 30-40% of original length)
Abstractive summarization: LLM rewrites into concise form
Target length: 50-100 words (bullets), 100-200 words
Format options:
Bullet points: 5-7 key facts
Paragraph: Continuous narrative
Comparison: Side-by-side key differences
Readability check: Flesch-Kincaid scoring; target 8-10 grade level.
3.8 Infrastructure and model specifications
LLM: groq/llama-3.3-70b-versatile via Groq Cloud API
Backend: FastAPI 0.104, Python 3.9
Vector database: ChromaDB 0.4.3
Embeddings: Sentence-Transformers (all-MiniLM-L6-v2, 384-dim)
Authentication: JWT tokens
Transaction storage: MongoDB
Hardware (development): Intel i7 (8 cores, 16GB RAM), NVIDIA RTX 3050 (8GB VRAM) for local embeddings.
4.1 Experimental setup
Dataset:
• 100+ unique healthcare product documents (PDF + structured data)
• Categories: Supplements, OTC medications, medical devices, wellness products
• Sources: Amazon India, 1mg, Flipkart, NetMeds
• Document characteristics: 500-3000 tokens; mean 1200 tokens
• Test queries: 200 annotated by domain experts (40 per intent class)
Evaluation metrics:
• Query classification accuracy (precision, recall, F1 per intent)
• Response latency (p50, p95, p99 percentiles)
• Hallucination rate (% of responses containing unsupported claims)
• Cost per interaction (USD)
• User satisfaction (Likert 1-5)
4.2 Query classification results
Router Agent Performance (n = 200 test queries) is given in Table 3.
Table 3. Performance Measures of agents
|
Intent Class |
Precision |
Recall |
F1 |
Accuracy |
|
General Info |
0.94 |
0.92 |
0.93 |
92% |
|
Comparison |
0.91 |
0.88 |
0.89 |
89% |
|
Order |
0.95 |
0.93 |
0.94 |
94% |
|
Summarization |
0.87 |
0.85 |
0.86 |
85% |
|
Recommendation |
0.88 |
0.86 |
0.87 |
87% |
|
Micro-average |
0.91 |
0.89 |
0.90 |
92% |
Observations:
• Highest performance on Order (0.95 precision), lowest on Summarization (0.85)
• Order intent captured by clear transactional language; Summarization requires nuance
• No ML training required; rule-based router achieves competitive accuracy
4.3 Response quality and latency
Agent Performance (n = 150 diverse queries) is summarized in Table 4.
Table 4. Comparative analysis
|
Agent |
Latency (p50) |
Latency (p95) |
Tokens Generated |
Quality (1-5) Intent |
|
GENERAL_INFO |
425 ms |
520 ms |
187 |
4.3 |
|
COMPARISON |
680 ms |
820 ms |
312 |
4.5 |
|
SUMMARIZATION |
520 ms |
650 ms |
156 |
4.2 |
|
ORDER |
380 ms |
450 ms |
98 |
4.6 |
|
SYSTEM AVERAGE |
420 ms |
560 ms |
188 |
4.4 |
Quality assessment: 50 domain experts rated responses on accuracy (0-5), relevance (0-5), clarity (0-5), then averaged.
4.4 Ablation study
Impact analysis (varying system configuration) is documented in Table 5.
Table 5. Impact analysis
|
Configuration |
Accuracy |
Hallucination |
Latency |
Cost/Query |
|
Full system |
92% |
3.2% |
425 ms |
$0.018 |
|
Without RAG |
76% |
18.5% |
320 ms |
$0.014 |
|
Without Validator |
86% |
9.8% |
405 ms |
$0.017 |
|
Without Router (monolithic) |
68% |
22.1% |
850 ms |
$0.025 |
Interpretation:
4.5 Concurrent load testing
System behavior under realistic load:
Bottleneck analysis: Vector search in ChromaDB becomes saturated at 10+ concurrent users. Dedicated vector DB instance (not tested), expected to improve compliance to 95%+ at 20 users, is summarised in Table 6.
Table 6. Load testing
|
Load |
Concurrent Users |
SLA Compliance (≤ 500 ms) |
P95 Latency |
Accuracy |
|
Light |
5 |
92% |
520 ms |
92% |
|
Moderate |
10 |
88% |
680 ms |
89% |
|
Heavy |
20 |
75% |
1200 ms |
94% |
4.6 User evaluation
Methodology: 50 healthcare professionals + consumers; 2-week trial; 10 queries each; Likert scale 1-5, Table 7 below contains some user evaluation metrics
Table 7. User acceptance testing
|
Aspect |
Rating |
Notes |
|
Product info accuracy |
4.4 |
Generally trusted; concerns about web-scraped data freshness |
|
Comparison quality |
4.6 |
Appreciated multi-source view; wanted more vendors |
|
Order ease |
4.7 |
Clear workflow; preferred authenticated channels |
|
Response speed |
4.2 |
Acceptable; timeouts noted at peak hours |
|
Overall usefulness |
4.5 |
Would adopt for daily use; requested Hindi language support |
5.1 Multi-agent architecture findings
The difference of 24 percent in accuracy for multi-agent systems relative to the monolithic method seems to be quite robust. The margin of improvement also does not reduce when more data is used for training, which can potentially point to reasons related to structure rather than training data itself. Specialization allows for optimization of each of the agents independently for the task for which it specializes.
The router agent's 92% classification accuracy matters disproportionately. Misclassification cascades: an order request misclassified as general information triggers RAG retrieval rather than transaction processing. The ablation study shows this single component contributes 24 points of system accuracy.
5.2 Simple vs. advanced Retrieval-Augmented Generation
Our finding that Simple RAG achieves sufficient accuracy (92%) for product domains without Advanced RAG complexity contradicts some recent literature emphasizing sophisticated RAG variants. However, the distinction likely reflects problem-domain characteristics:
MediMate operates in the latter domain. Healthcare product queries don't typically require "What ingredients interact with condition X?" (multi-hop). They ask "What's in it?".
In other words, latency penalties from Advanced RAG are greater than accuracy gains from simple RAG: 2-3x vs. 1-2 points, accordingly. That makes simple RAG economically superior.
5.3 Cost-benefit dynamics
The economic justification for the 70B model has to be based on latency penalties. Organizations operating at <1K queries/day might find 8B or local 7B cost-effective. For enterprise scale (>10K queries/day), the 70B model's faster.
Inference prevents retry cascades that make small per-token cost differences compound. This analysis underlines the relevance of total cost-of-operation metrics beyond simple per-token pricing.
5.4 Scaling limitations
ChromaDB’s vector search functionality becomes a bottleneck when 10+ users are concurrent. This is understood to be a result of using a single instance. This should be improved by using distributed infrastructure for vector searches (e.g., Pinecone or Weaviate), which is to be addressed in our roadmap.
The system architecture (FastAPI + stateless agents) is horizontally scaled, but the bottleneck is particularly in vector retrieval and not in routing requests.
5.5 Limitations and threats to validity
Data Scope: Applies to a data set of over 100 products in the supplements, OTC, or medical device space, but does not apply or generalize to prescription drugs or clinical decision support systems. Our assessment is confined to e-commerce.
Evaluation time: 50 users for a 2-week trial is good initial proof, but not indicative of behavior patterns or seasonal changes on a large scale.
Web scraping dependencies: The Comparison agent uses trusted domains; if web scraping involves a change to their HTML layout, their CSS may become fragile. This was noted to be a vulnerable point.
Hallucination assessment: Manual checking of 47 responses is a small sample size, and a bigger random sample would have been useful in assessing claims of hallucinations.
MediMate exhibits a real-world application of healthcare e-commerce systems automation using multi-agent architectures. The agent responds to 92% of queries correctly and takes an average response time of 425 ms and a cost of $0.018—the best according to commercial systems.
Core findings:
Immediate next steps (3-6 months):
Future research directions:
This framework creates a platform for the automation of healthcare e-commerce, and within it, a possible pathway for improvements is ensured. Whether such sophisticated architectural designs, necessitated by the increasing scope relating to clinical decision support, would become necessary in the future or not is still pending, and in the present phase of product-oriented business, simplicity would remain apt.
[1] Kavitha, B.R., Murthy, C.R. (2019). Chatbot for healthcare system using artificial intelligence. International Journal of Advance Research, Ideas and Innovations in Technology, 5(3): 1304-1307.
[2] Kaponis, A., Kaponis, A.A., Maragoudakis, M. (2023). Case study analysis of medical and pharmaceutical chatbots in digital marketing and proposal to create a reliable chatbot with summary extraction based on users' keywords. In Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments, pp. 357-363. https://doi.org/10.1145/3594806.3604765
[3] Quidwai, M.A., Lagana, A. (2024). A rag chatbot for precision medicine of multiple myeloma. MedRxiv, 2024-03. https://doi.org/10.1101/2024.03.14.24304293
[4] Wu, J., Zhu, J., Qi, Y., Chen, J., Xu, M., Menolascina, F., Grau, V. (2024). Medical graph rag: Towards safe medical large language model via graph retrieval-augmented generation. arXiv preprint arXiv:2408.04187. https://doi.org/10.48550/arXiv.2408.04187
[5] Sudarshan, M., Shih, S., Yee, E., Yang, A., Zou, J., Chen, C., Zhou, Q., Chen, L., Singhal, C., Shih, G. (2024). Agentic LLM workflows for generating patient-friendly medical reports. arXiv preprint arXiv:2408.01112. https://doi.org/10.48550/arXiv.2408.01112
[6] Bousetouane, F. (2025). Agentic systems: A guide to transforming industries with vertical ai agents. arXiv preprint arXiv:2501.00881. https://doi.org/10.48550/arXiv.2501.00881
[7] Yu, H., Zhou, J., Li, L., Chen, S., et al. (2024). Aipatient: Simulating patients with EHRS and LLM powered agentic workflow. arXiv preprint arXiv:2409.18924. https://doi.org/10.48550/arXiv.2409.18924
[8] Borkowski, A., Ben-Ari, A. (2024). Muli-agent AI systems in healthcare: Technical and clinical analysis. Journal of Healthcare IT, 15(2): 45-62. https://doi.org/10.20944/preprints202410.0182.v1
[9] Babu, A., Boddu, S.B. (2024). BERT-based medical chatbot: Enhancing healthcare communication through natural language understanding. Exploratory Research in Clinical and Social Pharmacy, 13: 100419. https://doi.org/10.1016/j.rcsop.2024.100419
[10] Sai, S., Gaur, A., Sai, R., Chamola, V., Guizani, M., Rodrigues, J.J. (2024). Generative AI for transformative healthcare: a comprehensive study of emerging models, applications, case studies, and limitations. IEEE Access, 12: 31078-31106. https://doi.org/10.1109/ACCESS.2024.3367715
[11] Kandpal, P., Jasnani, K., Raut, R., Bhorge, S. (2020). Contextual chatbot for healthcare purposes (using deep learning). In 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4), London, UK, pp. 625-634. https://doi.org/10.1109/WorldS450073.2020.9210351
[12] Jarang, S., Joshi, D., Deshpande, V.S. (2019). Behaviour analysis using word embedding & machine learning on social media. In 2019 5th International Conference on Computing, Communication, Control and Automation (ICCUBEA), Pune, India, pp. 1-6. https://doi.org/10.1109/ICCUBEA47591.2019.9129273
[13] Joshi, D.J., Makhija, M., Nabar, Y., Nehete, N., Patwardhan, M.S. (2018). Mental health analysis using deep learning for feature extraction. In Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, Goa, India, pp. 356-359. https://doi.org/10.1145/3152494.3167990
[14] Calvaresi, D., Calbimonte, J.P., Siboni, E., Eggenschwiler, S., Manzo, G., Hilfiker, R., Schumacher, M. (2021). EREBOTS: Privacy-compliant agent-based platform for multi-scenario personalized health-assistant chatbots. Electronics, 10(6): 666. https://doi.org/10.3390/electronics10060666
[15] Low, Y.S., Jackson, M.L., Hyde, R.J., et al. (2024). Answering real-world clinical questions using large language model based systems. arXiv preprint arXiv:2407.00541. https://doi.org/10.48550/arXiv.2407.00541
[16] Bhat, V., Cheerla, S.D., Mathew, J.R., Pathak, N., Liu, G., Gao, J. (2024). Retrieval augmented generation (rag) based restaurant chatbot with AI testability. In 2024 IEEE 10th International Conference on Big Data Computing Service and Machine Learning Applications (BigDataService), Shanghai, China, pp. 1-10. https://doi.org/10.1109/BigDataService62917.2024.00008
[17] Duan, Z., Wang, J. (2024). Exploration of LLM multi-agent application implementation based on langgraph+ crewai. arXiv preprint arXiv:2411.18241. https://doi.org/10.48550/arXiv.2411.18241
[18] Acharya, D.B., Kuppan, K., Divya, B. (2025). Agentic AI: Autonomous intelligence for complex goals—A comprehensive survey. IEEE Access, 13: 18912-18936. https://doi.org/10.1109/ACCESS.2025.3532853
[19] Hosseini, S., Seilani, H. (2025). The role of agentic AI in shaping a smart future: A systematic review. Array, 26: 100399. https://doi.org/10.1016/j.array.2025.100399
[20] Singh, A., Ehtesham, A., Kumar, S., Khoei, T.T. (2025). Agentic retrieval-augmented generation: A survey on agentic rag. arXiv preprint arXiv:2501.09136. https://doi.org/10.48550/arXiv.2501.09136
[21] Xu, C., Jiang, X. (2022). iMedBot: A web-based intelligent agent for healthcare related prediction and deep learning. arXiv preprint arXiv:2210.05671. https://doi.org/10.48550/arXiv.2210.05671
[22] Schmidgall, S., Ziaei, R., Harris, C., Reis, E., Jopling, J., Moor, M. (2024). Agentclinic: A multimodal agent benchmark to evaluate ai in simulated clinical environments. arXiv preprint arXiv:2405.07960. https://doi.org/10.48550/arXiv.2405.07960