LLM observability & evals
Arize Phoenix is an open source AI observability and evaluation platform for LLM and RAG applications.
Platform for evaluating monitoring and improving AI applications with datasets experiments prompts and traces
Open source observability platform for logging monitoring caching and analyzing LLM requests
Langfuse is an open source LLM engineering platform for tracing evaluation prompt management and metrics.
LangSmith is an observability and evaluation platform for debugging testing and monitoring LLM apps.
Open-source observability and evaluation tooling for tracking and improving LLM applications.
ML monitoring
Arize AI provides observability and evaluation tools for troubleshooting ML models and LLM applications in production.
Evidently provides open source and managed tools to evaluate test and monitor AI and ML systems.
Fiddler is an AI observability platform for monitoring explaining and improving ML Models and LLM applications.
WhyLabs provides AI observability for monitoring data quality model behavior and production ML applications.
Experiment tracking
Comet is an experiment management and model production platform for tracking comparing and optimizing ML work.
Open source platform for managing the machine learning and generative AI lifecycle from tracking to deployment.
Neptune is an experiment tracking and metadata store for logging organizing and comparing machine learning runs.
Developer platform for experiment tracking model evaluation dataset versioning and ML observability.
Hyperparameter optimization
Hyperopt is a Python library for serial and parallel optimization over search spaces.
Open source automatic hyperparameter optimization framework with define by run search spaces for machine learning.