2021 AI Data Science Platforms (end-to-end)

A high level overview of the leading AI Data Science Platforms, with whole system (end-to-end) workflow, on the market today!

 

2021 AI Data Science Platforms

 

There are many commercial and open-source machine AI Data Science Platforms in the market today.  As organizations recognize the need, and assess the question of ‘on-premise or off-premise’, they will evaluate the end-to-end model lifecycle for developing and deploying AI.

From big technology giants like Microsoft Azure ML, IBM Watson, Google AI, and Amazon SageMaker to smaller players like Iguazio, Dataiku, Algorithmia, and Anaconda each have their unique features to offer.

This article explores a high level overview of the various AI Data Science Platforms available, taking a look at the challengers, giants and open-source systems that look promising in 2021.

What is a Data Science Platform?

A data science platform is a ‘single point’ software application where all the data science work takes place in one whole ‘end-to-end’ product, including all functionalities. It encompasses and places the entire data sourcing, workflows, featuring, modeling, experiments, training, deployment and monitoring process(es) in the hands of data scientists and business team members. The workflow includes integrating and exploring data from various sources, the coding and building of models that leverage this data and features, and deploying those models into production, monitoring model drift, etc. 

Of course there are different categories of ‘end-to-end’ when it comes to Machine Learning (ML) and Natural Language Processing (NLP), as well as industry domain. An example of a specific industry platform would be for the drug development processes. However this article focuses on the general machine learning platforms being used across industry sectors. The majority of these platforms are proprietary, and can support both On-Premise or Cloud deployments, with typically smaller vendors running Kubernetes containers in the cloud only.

It is still not common to find feature stores, model metadata management, model interpretability, or auto producing documentation. 

The Challengers

Algorithmia Data Science Machine Learning Platform
Algorithmia

Algorithmia (2014) (LinkedIn) is a challenger player whose machine learning operations platform (MLops) manages all stages of the production ML lifecycle within existing operational processes to deliver more models quickly alongside the software development lifecycle (SDLC).

Algorithmia Platform ArchitectureSource: algorithmia.com

Features Benefits
  • Streamlined pre-built integration to common data sources with a broad range of language and framework support. 
  • Easy deployment in cloud, on-premises behind a firewall, or hybrid, with extensible API architecture.
  • Helps manage your own ML lifecycle of capabilities with versioning, any model, and team monitoring without managing infrastructure or integrations. It basically automates DevOps for deploying models.
  • Integrated monitoring, reporting, alerts, with model evaluation tools to test performance and governance capabilities.
  • Supports Jupyter, Scikit-learn, TensorFlow, Keras, Pytorch,  Python, R, Java, Caffe Chainer MXNet, XGBoost, Spark.
  • Multiple enterprise options – Enterprise Dedicated (ideal for department-scale projects hosted on the cloud) and Enterprise Advanced (ideal for enterprise-scale projects hosted on any cloud, hybrid, or on-premises infrastructure).
  • Fast and efficient scalability. It closes the gap between model training and production, allows you to decrease DevOps renounces needed to deploy and manage models.
  • Role-based access controls across users, data, models, and resources for better security. 
  • Decrease serving costs, on CPU or GPU.

 

Algorithmia Machine Learning Screenshot
Source: algorithmia.com

 

 

Prevision IO Data Science Machine Learning Platform

Prevision (2016) (LinkedIn) is an end-to-end enterprise AI platform with core abilities focus on streamlining ML lifecycle processes with automation. This allows team members to create, deploy, and manage AI-powered business applications across their entire lifecycle. It’s an open and modular platform, best suited for business impact. 

Features Benefits
  • Extensive library of use-case templates, pre-trained model data packs, and ready-to-use apps.
  • AutoML engines within full AI lifecycle.
  • Ability to build model(s) code-free; or use the R or python API via a notebook environment.
  • Deployment using production-based apps and high-performance APIs for scalability.
  • Vast domain knowledge across industries that helps build industry-specific models.
  • Developers benefit from self-documented APIs.
  • A highly productive, cloud-based development and deployment framework.
  • Fast and secure.
  • ML lifecycle capabilities are built on top of the AutoML foundation.

Prevision IO AI Screenshot




Iguazio Data Science Machine Learning Platform

Iguazio (2014) (LinkedIn) is an Israeli company offering the complete end-to-end automation of Machine Learning pipelines from data collection, preparation, training to deployment. It also provides ongoing monitoring and support in production. It has a production-ready (on-premises/ managed-cloud) feature store which is fully integrated into the platform, which also maintains MLRun, an open source MLOps Orchestration Framework (which tightly integrates with Kubeflow).

This enables users to create robust ML pipelines within a single interface for incorporating the feature engineering produced by the feature store, as well as other parts of the pipeline such as model development, deployment and serving. Offline it uses Parquet and online is in mem database. Feature engineering is achieved via Spark, Python, and Nuclio.

Features Benefits
  • With ML Run (Iguazio’s open-source framework), where users can execute, experiment, data, track models, and automatically deploy. 
  • Integrated Features store allows for online and offline, sharing across teams and ML platforms.
  • The product uses a multi-model database for serving the computed features through different APIs and formats (like files, SQL queries, pandas, real-time REST APIs, time-series, streaming).
  • Supports Jupyter, Scikit-learn, TensorFlow, Keras, Pytorch Python, R, Java, XGBoost, Spark.
  • Fast and secure.
  • Easy and quick deployment of AI models – AWS MLOps Solution.
  • Handles Hyperparameter Tuning, Model Deployment and Monitoring.
  • Monitor models in production and automates drift detection.



BigML (2011) (LinkedIn) is a company with a mission to make Machine Learning easy. Today BigM’s Machine Learning Platform provides end-to-end solutions for seamlessly transforming data into actionable AI models. They focus on automating Classification, Regression, Time Series Forecasting, Cluster Analysis, Anomaly Detection, Association Discovery, and Topic Modeling tasks.

Features Benefits
  • Easy-to-use web interface and REST API to get your Machine Learning model moving. 
  • Access to the latest fully-featured version with a free account. 
  • Models are fully exportable and can be used via all popular programming languages. 
  • Flexible Deployments- we can deploy fully managed and self-managed versions of ML models in any environment. 
  • Secured and private platform.
  • Reproducible and traceable. 

BigML Machine Learning Screenshot

 

RapidMiner Data Science Machine Learning Platform

RapidMiner (2007) (LinkedIn) provides end-to-end AI enterprise solution through their Automated Data Science platform. 

Features Benefits
  • 1500+ native algorithms, data prep, and data science functions.
  • Notebook, and integration with custom Python and R.
  • Advanced analytics and robust platform services.
  • Deep domain expertise across various industries.
  • Fast and resilient.
  • ROI-centric approach.

Rapidminer Screenshot

 

The Mid-level Challengers

DataRobot (2012) (LinkedIn), a company founded in 2012, offers an enterprise AI platform that accelerates data science by automating the end-to-end AI journey. In addition, it allows the deployment of trusted AI applications at scale within any organization.

Features Benefits
  • User-centric solutions tailored for AI creators, operators, and consumers.
  • Best practices to prepare, build, deploy, and maintain AI-driven applications. 
  • Runs on any platform.  
  • World-class data science expertise offers easy creation of automated AI models. 
  • Single platform to deploy, maintain, and govern all your production models, regardless of where and how they were created. 
  • Automated Model Health Monitoring and Lifecycle Management.

Data Robot Screenshot



Domino (2013) (LinkedIn) an enterprise MLOps platform is another AI provider that helps build and deploy AI models faster and globally.  Explore their ecosystem with a full list of functionalities. 

Features Benefits
  • Has a Projects Portfolio Manager, Labs, Launchpad, and Control Center with workspaces that allow you to use preferred tools Visual Studio Code, and RStudio alongside Jira integration.
  • Availability of a secure and governed ‘system-of-record’, runs include scheduled jobs and workspace sessions, with model run comparisons.
  • Deploy and monitor Models with ease with Model APIs or the command line interface (CLI). 
  • Access to the latest tools, it supports Python, R, Spark.
  • A full AI lifecycle solution across various industries. 
  • State of the art working environment that helps with rapid model delivery. 
  • Knowledge Center that creates a virtual cycle of feedback and collaboration across teams, IT, and business stakeholders.

 

Domino Data Lab Diagram



H20 Data Science Machine Learning Platform

H2O (2012) (LinkedIn) Hybrid Cloud platform democratizes artificial intelligence to develop sophisticated AI technology and easy-to-use AI applications.

Features Benefits
  • Thousands of use cases. 
  • Supports distributed computing and rapidly trains models from large datasets. 
  • Supports popular ML libraries including H2O-3, XGBoost, TensorFlow, PyTorch, Python, R, Java, XGBoost, LightGBM, Spark, as well as over 100 pre-coded experiments and TensorFlow recipes.  
  • Explainable world-class Models.
  • Any Cloud or On-premises infrastructure.
  • High performing and easily scalable.
  • Extensive and customizable.


H2O Hybrid Cloud Platform Diagram

 

Dataiku Data Science Machine Learning Platform

Dataiku (2013) (LinkedIn) is a software company that helps businesses innovate using data science, machine learning, and AI. Its Data Science Studio is a component that assists engineers, analysts, and scientists generate, explore, and deliver data products effectively. 

Features Benefits
  • Code and no-coders options allow quick model builds.
  • 90 built-in data transformers for common data manipulations like binning, concatenation, and more. 
  • Enables business analysts to easily create and publish a project with a few clicks.
  • Supports Jupyter, Scikit-learn, TensorFlow, Keras, Python, R, XGBoost, LightGBM, and Spark.
  • Users can explore, generate, and do preparations without dealing with storage, access, and format issues.
  • Access to data anytime and anywhere.
  • A simple-to-use dashboard makes it easy for users to generate visualizations and interactive charts from their datasets. 

 

Dataiku Machine Learning Diagram

 

C3 AI Data Science Machine Learning Platform

C3 AI (2009) (LinkedIn) is an enterprise AI provider that helps build, deploy, and operate AI applications. It uses a unique model-driven architecture to accelerate delivery and reduce the complexities of developing enterprise AI applications. Their best practices in Enterprise AI Application Development feature set supports explain-ability AI, champion-challenger testing and configuration, with numerous other features.

Features Benefits
  • A low-code/ no-code environment for developing, deploying, and operating enterprise AI applications.
  • AI Data Studio, AI ML Studio, with AI Marketplace that supports third-party IDEs.
  • Multi-layered security approach with ML model validation.
  • Scale AI Across the business.
  • Execute complex tasks asynchronously.
  • End-to-end data security​.
  • A heavy focus on technical requirements for a next-gen AI platform.

 

C3 AI Screenshot



KNIME Data Science Machine Learning Platform

KNIME (2008) (LinkedIn), based in Zurich, is open-source software end-to-end Data Science Platform that allows for designing data science workflows and making reusable components accessible to everyone. KNIME also has commercial solution for productionizing data science.

Features Benefits
  • Offers visual workflows for data analytics with an easy-to-use graphical interface without coding. 
  • Cloud-based software – both available on Microsoft Azure and Amazon AWS.
  • Borrows tools from different domains with KNIME.
  • Blend data from any source.
  • Leverage Machine Learning and AI.
  • Discover and share data insights.

 

KNIME Analytics Platform Screenshot

 

Mathworks Data Science Machine Learning Platform

MathWorks (1970’s) (LinkedIn) is mathematical computing software company, that designs and creates efficient AI models and AI-driven systems through its platform MATLAB for Machine Learning (2004). Years before R and Python, there were only a few key players in the market, MATLAB was one of them, developing as a high-performance language for technical computing.  Today, the language and products are mostly used in academia. It combines a desktop environment with development, data preprocessing, model design processes, and analysis tools that expresses matrix and array mathematics. MathWorks continues to build many data science components, utilities around the model life cycle.

As a platform it does have limitations, and lacks the functionalities you expect to find in an end-to-end AI Data Science Platform. Data science teams tend to use MATLAB to create prototypes when doing trial and error experiments, and validating a fresh concept(s). While we find in the market real implementations are being ported to python, c++ or a similar languages when going to production. MATLAB is still worth mentioning being  MathWorks continues to add functionality. The product will go a long way in the field of data science in the years to come.

Features Benefits
  • Point-and-click for training and comparing models, with data exploration and visualization.
  • Preprocess data, build machine learning and predictive models.
  • Apply domain-specific feature engineering techniques with Project Management.
  • Document data analysis with MATLAB graphics and the Live Editor notebook environment.
  • Spend less time preprocessing data.
  • Deploy machine learning models anywhere, including C/C++ code, CUDA® code, enterprise IT systems, or the cloud. 
  • Find the best machine learning models. 
  • Deploy machine learning models to production without recoding into another language.

The Giants 

Azure Data Science Machine Learning Platform

Microsoft (1975) (LinkedIn) Azure Machine Learning (AML) Studio (2014) is one of the big players in this space. They offer on-premises, hybrid, multi-cloud, or at the edge—secured, future-ready cloud solutions. Here is an example of their getting started documentation

Features Benefits
  • Low-code/ no-code environment. 
  • Data is highly secured as it is built-in at different levels and customer requirement customizable.
  • Offers better backup and disaster recovery options, integrated directly with your BCDR (Business Continuity/ Disaster Recovery). 
  • Cloud-based.
  • Mobile-based.
  • Unmatched hybrid capabilities.

 

Azure Machine Learning Screenshot

 

IBM Watson Data Science Machine Learning Platform

IBM (1911) (LinkedIn) Watson Studio (2014) is another big player that helps build and scale AI models by automating AI lifecycle management. It organizes data, builds, runs, manages AI models, and optimizes decisions across any cloud using IBM Cloud Pak for Data.

Features Benefits
  • One-click analysis and data recovery with the use of automatic visualizations.
  • With trade-off analytics, can help avoid unnecessary options and determine the suitable options from multiple objectives.
  • Provides an automated predictive analysis service that automatically surfaces the driving outcomes.
  • Choice and flexibility.
  • Security and trust.
  • Deep industry capabilities.

 

Amazon Sagemaker Data Science Machine Learning Platform

Amazon (1994) (LinkedIn) SageMaker (2017) is software that helps build, train, and deploy machine learning (ML) models.

Features Benefits
  • A single web-based visual interface for all your ML development needs.
  • Notebooks are one-click Jupyter that can be easily shared. 
  • Over 150 popular open-source models and over 15 pre-built solutions for common use cases to get started.
  • Broad framework support.
  • Economical.
  • Secure.

 

TIBCO Data Science Machine Learning Platform

TIBCO (1997) (LinkedIn) Data Science (2018) offers end-to-end AI software solution that helps democratize, collaborate, and operationalize machine learning across organizations. 

Features Benefits
  • Simple and easy drag-and-drop interface for data prep, analytic, and scoring pipelines.
  • Advanced analytics and predictive models on live data. 
  • Offers role-based security for any asset within the system.
  • Multiple Model Development Languages and Tools.
  • Security, Governance, and Cost Controls.
  • Flexible Model Publishing Options.

 

SAS Data Science Machine Learning Platform

SAS (LinkedIn) Visual Data Mining and Machine Learning (2016) support the end-to-end data mining and machine learning process with a comprehensive visual and programming interface.

Features Benefits
  • Automated feature engineering and modeling.
  • Public API for automated modeling.
  • Enables Python users to access high-level APIs for deep learning functionalities within Jupyter notebooks via the SAS Deep Learning with Python (DLPy) open-source package on GitHub.
  • Complex analytical problems solved much faster.
  • The system’s “automated model tuning” helps to identify the best-performing model.
  • The SAS platform’s built-in visual drag-and-drop interface enables users who lack coding expertise and knowledge to use it with ease. 

 

HPE Data Science Machine Learning Platform

HPE (1939) (LinkedIn) Ezmeral MLOps (2020) makes AI that is data-driven, production-oriented, and cloud-enabled, available anytime, anywhere, and at any scale.

Features Benefits
  • Flexible and unbounded. On-prem, cloud and hybrid options consider the team’s location, access needs, security, and cost constraints. 
  • Design future-ready AI strategy quickly by making use of industry-specific best practices. 
  • Offers accessibility, flexibility, and scalability of the cloud with the security and cost benefits of on-prem infrastructure.
  • Accelerated, GPU-powered AI.
  • IT-compliant servers for production-grade AI.
  • Faster time to proof-of-concept and ROI.

 

Google AI Data Science Machine Learning Platform

Google (1998) (LinkedIn) Vertex AI (2021) is a managed machine learning (ML) platform that allows companies to accelerate deployment and maintenance of artificial intelligence (AI) models. Recently launched in May 2021, as an AI platform that provides access to Google AI toolkits and includes computer vision, language, conversation and structured data, continuously enhanced by Google Research.

Features Benefits
  • MLOps life cycle management.
  • Dataset and feature management.
  • Model management and governance.
  • Orchestrating and automating training pipelines.
  • Model deployment and prediction serving.
  • Build, deploy, and scale ML models faster, with pre-trained and custom tooling within a AI platform.
  • Accelerating data preparation, training and experimentation.
  • Train models without code, minimal expertise required.
  • Build advanced ML models with custom tooling.

 

Google Vertex AI Diagram

Source: Google Vertex AI Lifecycle


Infosys Nia Data Science Machine Learning Platform

Infosys (1981) (LinkedIn) Nia AIOps (2020) is an enterprise-grade knowledge based AI platform that simplifies the AI adoption journey for Business and IT. It provides an end-to-end AI journey from data management, digitization of documents and images, model development to operationalizing models. The Platform enables businesses to continuously reinvent their System Landscapes.

Features Benefits
  • Seamless integration, built-in workflow for audit, compliance purposes.
  • Monitor business metrics, uncover anomalous patterns.
  • Cognitive Search retrieve precise answers from diverse data sources and formats, using ML and Neural Models.
  • DocAI automates end-to-end document processing lifecycle from ingestion to consumption, using AI capabilities such as Extractor, Computer Vision, NLP and Cognitive Search.
  • Contracts Analysis to derive insights from documents and legal documents.
  • Reduce incident lifecycle.
  • Extensive Learning Center.
  • Simplify the AI journey.
  • Digitize and harvest intelligence from documents and data across the polycloud.
  • Integrates with existing AI infrastructure to protect your investments.
  • Enhances your consumption of AI workloads through the end-to-end lifecycle.
  • Speedy Data Processing, empowers your citizen data scientists to build complex use-cases easily and accelerate the AI journey

Alibaba Data Science Machine Learning Platform

Alibaba (1999) (LinkedIn) Cloud, a global leader in cloud computing and artificial intelligence, offers an end-to-end ML platform (2017) that provides various machine learning algorithms to meet your data mining and analysis. requirements. provides end-to-end machine learning services, including data processing, feature engineering, model training, model prediction, and model evaluation. Machine Learning Platform for AI combines all of these services to make AI more accessible than ever.

An example of their documentation.

Features Benefits
  • ModelHub, SDK.
  • AutoLearning, Image classification, Object detection, OCR, Matching recall.
  • Visualized interface.
  • Allows complete workflow creation for enterprise-level machine learning data modeling and application.
  • A myriad of algorithms covering such scenarios as regression, classification, clustering, text analysis, finance, and time series.
  • Infrastructure of Machine Learning Platform for AI relies on Alibaba Cloud distributed computing clusters.

 

Alibaba Machine Learning Diagram

 

Oracle Data Science Machine Learning Platform

Oracle (1977) (LinkedIn) Data Science Platform (2020) helps build and evaluate higher-quality machine learning (ML) models.

Features Benefits
  • Automated machine learning (AutoML) helps in automating algorithm selection, feature selection, and model tuning.
  • Drag-and-drop data integration and preparation tools.
  • Oracle security tools and user interfaces enable multiple roles to participate in projects and share models. 
  • Build high-quality models faster and easier.
  • Easy deployment of models.
  • Access any data flexibly and quickly.

 

Samsung SDS Brightics AI Machine Learning Platform

Samsung SDS (1985) (LinkedIn) launched Brightics AI (2017) as a comprehensive AI platform.  It supports AI model creation and integrates collaborative and managerial features to provide an environment for complete analysis. 

Features Benefits
  • Easy-to-use modeling tools and functions without specialized DS knowledge or scripting.
  • Operationalize insights right away to enable actions on-site and facilitate teamwork.
  • Hadoop Analytics Engine, DataFlow Modeling, and Deep Learning Modeling.
  • Cloud-based platform.
  • Robust stability and performance.
  • Brightics Studio is a lightweight analytics tool developed by open source.

Samsung SDS Brightics AI Screenshot
Source: Samsung SDS Brightics AI Screenshot components

 

SAP ML machine learning solutions

SAP (1972) (LinkedIn) ML Lab (2020) is a centralized open source science hub to build, deploy, and operate machine learning solutions at scale. In the start of 2020 SAP added machine learning capabilities to its Data Hub (Cloud and/ or on Premise) to deliver data intelligence solutions. Today, it covers the end-to-end machine learning lifecycle on the Red Hat OpenShift Container.

Features Benefits
  • Workspace with integrated tooling (Jupyter, VS Code, SSH, VNC, Hardware Monitoring, etc.).
  • Deploy models as production-ready services.
  • Monitor and share experiments for reproducibility.
  • ML Lab offers a secure multi-user development platform for machine learning solutions.
  • Upload, manage, version, and share datasets and models.
  • Deployable on a single-server via Docker or a server cluster via Kubernetes.

SAP Machine Learning Diagram

 

Open Source

In the market there are a limited number of open source data science platforms that support the end-to-end lifecycle when consideing full stack capabilities. Many are considered AI projects toolkits for development of models, such as TensorFlow, PyTorch, Keras.io, Caffe, Kubeflow (2018), Shogun, Accors.Net, Ludwig, (2019), RAPIDS (2018), Angel (2017), Apache Apache PredictionIO (2016). 

Other open source AI Toolkits support the tail end of the model lifecycle, optimizing, deployment and monitoring.  Examples inlcude ForestFlow (2019), Merlin (2020), Metaflow (2019), MLeap (2016), Multi Model Server (MMS) (2017), Onepanel (2020), Red Hat Open Data Hub (2019) and Adlik (2020). 

The following is a look at a few of the open source AI Toolkits/ Platforms worth mentioning. This is caveated considering the lack of functionalities you would normally find in an end-to-end AI Data Science Platform.


Anaconda Logo

Anaconda (2009) (LinkedIn), AI Enterprise (2020).  Anaconda is a company committed to open source innovation. They originated using Python for data science back in 2009, founded PyData, NumFOCUS, that created Anaconda and Conda. In 2017 they shipped an early release of their enterprise version, the full enterprise end-to-end platform was launched in 2020. They are passionate about data science, building AI models that are explainable, repeatable, and free from bias. The commercial versions of Anaconda are maturing within the Commercial, Team and Enterprise Editions, adding more end-to-end functionalities.

Features Benefits
  • Large cloud-based repository to find and install over 7,500 data science and machine learning packages.
  • Cross-platform compatible. 
  • Develop, collaborate, govern, deploy.
  • Easy to use interface to expedite the data science journey.
  • Flexibility and control.
  • Ongoing support. 
  • Training available. 

Anaconda Machine Learning Diagram

Scikit learn (2010) is one of the most popular open-source and machine learning libraries of Artificial Intelligence. It is a community-driven project where anyone can contribute to its development.

Features Benefits
  • Scikit-learn is largely written in Python, and some core algorithms are written in Cython to improve its performance. 
  • Cross-validation.
  • Feature extraction.
  • Simple and efficient tools for predictive data analysis.
  • Reusable in various contexts. 
  • It is built on NumPy, SciPy, and matplotlib. 

 

Intel Logo
Intel Analytics Zoo Logo

Intel (1968) Analytics Zoo (2019) is an open-source Big Data AI platform that can run on Google Colab.  It seamlessly scales AI models to big data clusters with thousands of nodes for distributed training or inference. Built on top of the open source frameworks of Apache Spark*, TensorFlow*, PyTorch*, OpenVINO™, and Ray, this unified analytics and AI platform has an extensive architecture to support more libraries and frameworks.

Features Benefits
  • Python notebook.
  • Support for Multiple Frameworks.
  • High-Performance Inference.
  • Designed for IT, DevOps, and MLOps.
  • Simplified Model Deployment.
  • Dynamic Scalability.
  • In addition, NVIDIA TRITON offers different types of inference queries through advanced batching and scheduling algorithms, supports live model updates, and runs models on both CPUs and GPUs.

 

Weka Logo

Weka (1997) is an open-source machine learning library developed by the international scientific community, with a large contribution by the University of Waikato, New Zealand. It intended to solve various data mining problems, with software that contains many built-in tools for standard machine learning tasks (here is the documentation and overview .pdf).  The Auto-WEKA component for automated machine learning system has its latest release in July 2017.

Features Benefits
  • Weka is written in Java and provides an API that is well documented and promotes integration into applications. 
  • Easy basic explorer user interface, with Experimenter that allows comparing predictive features of machine learning algorithms.
  • There are different panels for preprocessing, classify, associate, cluster, select attributes, and visualize panel.
  • 10 fold Cross-Validation evaluation. 
  • Extensive books and manuals to train to use the platform effectively. 
  • Open-source and free.
  • A great tool to learn ML.

 

Apache Submarine Logo

Apache Submarine

Apache Submarine (2020) is a new open source project of the Apache Software Foundation focused on the end-to-end Machine Learning Lifecycle to allow data scientists to create machine learning workflows. Keep an eye on it as it matures. The following it’s product roadmap.

Features Benefits
  • User interfaces (API, SDK, and IDE, etc.).
  • Run/ track distributed training experiment on-premises or cloud.
  • Manage versions of experiment and dependencies of environment.
  • Support popular machine learning frameworks, including TensorFlow, PyTorch, Horovod, and MXNet.
  • Model Management (Serving, versioning, monitoring, etc.).
  • Easy-to-use User Interface.
  • Bringing together the life cycle components.

 

Apache Mahout Logo

Apache Mahout (2016) is an open source project of the Apache Software Foundation to produce implementations of distributed, or otherwise scalable machine learning algorithms focused primarily on linear algebra. The goal of the Mahout project is to build an environment for quickly creating scalable, performance based machine learning applications. There are distributed Spark, Flink, H20 engine bindings. Mahout is limited via shell access, running command line jobs.  It does not have all the user interface functionalities you would find in a typical platform.

Features Benefits
  • Java 1.7+.
  • Mathematically Expressive Scala DSL, supports Hadoop 2.4, Spark 1.6.
  • Support for Multiple Distributed Backends (including Apache Spark).
  • Modular Native Solvers for CPU/ GPU/ CUDA Acceleration.
  • Ready-to-use framework for doing data mining tasks on large volumes of data.
  • Mahout lets applications analyze large sets of data effectively and in quick time.
  • Comes with distributed fitness function capabilities for evolutionary programming.

 

Other Analytics/ Element Niche AI Platforms

There are a number of data analytic platforms with data science components. These tend to not support the entire end-to-end model lifecycle, lack optimizing, deployment and monitoring capabilities.

Examples include Altair Knowledge Studio (2014), Alteryx (2020), Arrikto Enterprise Kubeflow (2019), Basis AI Bedrock (2018), Cloudera ML (2019), cnvrg.io (2016), Comet (2017), Databricks Data Science Workspace (2013), DataKitchen DataOps Platform (2016), D2iQ Kaptain (2020), Dataspine (2017), dotData Enterprise (2018), Explorium Data Science Platform (2017), Faculty (2014), FloydHub (2016), Grid Dynamics (2010), Hypergiant (2018), Neptune (2017), Pachyderm (2014), Peltarion (2005), Polyaxon (2018), Spell (2017) and the list goes on. 

AI ModelOps Platforms

ModelOps are platforms that support the last mile of the end-to-end model lifecycle. Main capabilities circle model optimizing, deployment and monitoring. 

Examples include Allegro AI (2016), Auger.AI (2019), BentoML (2019), Datatron (2016), Determined AI (2020), Digazu (2018),  Cubonacci (2018), One Convergence DKube (2018), Flyte (2019), Paperspace Gradient (2014), ModelOPS (2016), NVIDIA Triton (2018),  PI.EXCHANGE (2019), Run:AI (2018), TensorFlow Extended (TFX) (2019), Valohai MLOps Platform (2016), Verta (2018).

AI Feature Stores

Feature store applications are fairly new product technology domain that allows for the development, maintaining, and monitoring of data features used by machine learning algorithms in artificial intelligence systems around us.  2021 is the year of feature store maturity.

Further reading: Top 10 Leading Machine Learning Feature Stores

The market is evolving, let us know if we missed any end-to-end platforms that should have been mentioned.  There are more than few platforms in the market for managing the entire life cycle of machine learning.  A lot of factors would need to take in consideration when selecting a platform.  This article only highlights ones to explore as best of bread, end-to-end platforms. In the end, it boils down to your specific use case.

Spread the word

Related posts