At CNN, our mission is to inform, engage, and empower the world in a way that is trusted, timely, and transparent. This mission is more critical than ever as we face some of the most challenging times of our generation. As the world is becoming increasingly digital in nature, we are relentlessly focusing our mission to directly connect with our audience, understand what they care about most, and reach them in a way that is most accessible for their lifestyle. Our Data Intelligence team, in particular, leverages data and machine-learning capabilities to build innovative experiences for our audience and provides scalable solutions to CNN’s operations. As the world’s largest digital news destination, we averaged more than 200 million unique global visitors every month of 2020. Our catalog of raw audio and video footage also goes back several decades. Clearly, we have a lot of data!Read More
Counterfit is a tool for assessing security of machine learning system. Research found most ML engineers and incident responders are unequipped.Read More
Botpress, a Montreal-based early stage startup, wants to make it easier for developers to build conversational-based apps, meaning humans interact with the app by speaking instead of typing, clicking or tapping. Today it announced an $15 million Series A from Decibel and Inovia Capital.Read More
Red Hat developed the TrustyAI Explainability Toolkit, a library leveraging techniques for explaining automated decision-making systems. Part of Kogito, Red Hat’s cloud-native business automation framework, TrustyAI enriches AI model execution information through algorithms while extracting, collecting, and publishing metadata for auditing and compliance.Read More
Tecton, the company that pioneered the notion of the machine learning feature store, has teamed up with the founder of the open source feature store project called Feast. Today the company announced the release of version 0.10 of the open source tool.Read More
Open-Source Framework Simplifies Machine Learning Processes. The system brings hospital data through a set of machine learning models to help inform clinical decision-making.Read More
As a company founded by data scientists, Streamlit may be in a unique position to develop tooling to help companies build machine learning applications. For starters, it developed an open source project, but today the startup announced an expanded beta of a new commercial offering and $35 million Series B funding.Read More
ContinualAI Avalanche open source library is based on Pytorch, used for fast prototyping, training and reproducible evaluation of continual learning algorithms.Read More
State of the Edge To understand and foster better practices across businesses and the technology industry, the Linux Foundation created the LF Edge in 2019 and rolled the State of the Edge into it last year. State of the Edge is a vendor-neutral platform for open research on edge computing that is dedicated to accelerating innovation by crowdsourcing a shared vocabulary for edge.Read More
Open source artificial intelligence projects don’t always get a lot of publicity, but they play a vital role in the development of artificial intelligence. Because these open source projects are often pursued as passion projects by developers (sometimes in colleges and universities), the advances are creative and particularly forward-looking.
Typically freed from the constraints of a corporate setting (though some are supported by companies), these open source AI projects can dream big – and often deliver ground-breaking machine learning (ML) and AI advances.
Also important: the advances from these leading open source AI projects fuel the larger AI sector. That is, a new idea from this month’s AI project ends up next year (or even next month) in a high- end AI solution sold by a company.
Remember, if you know of additional top open source AI tools that should be on this list, please include them in the comments section below.
Open Source AI Projects
PyTorch has all the elements you’d expect from a leading open source AI project. It focuses on machine learning, arguably the most popular use of AI in this stage of the emerging technology’s growth. Even more important, developers and AI engineers can set PyTorch up on the top cloud computing platforms; PyTorch on AWS and PyTorch on Azure are both viable, as well as Google Cloud and Alibaba. PyTorch offers neural networks, a foundational element of AI development.
Open Neural Network Exchange
Developed by Microsoft and Facebook, Open Neural Network Exchange offers some very powerful tools, most particularly the ability to recycle fully developed neural network models (which have spent hours and hours being trained in systems) into various other systems. In essence, the Open Neural Network Exchange greatly extends the usefulness of existing models by enabling this porting. Expect ONNX to grow ever more popular in the years ahead.
IBM’s AI Fairness 360
The problem with bias in artificial intelligence algorithms is a growing concern, and AI Fairness 360 is the open source solution to address this. The tool provides algorithms to enable a developer to scan a ML model to find any potential bias, an essential part of fighting bias – and certainly a complex task. Importantly, AI Fairness allows AI engineers to explore the algorithms throughout the development lifecycle. The tool can be set to work automatically. Built into the tool’s foundation is an architecture that checks for correlations; do the correlations create a prediction that suggests a harmful stereotype?
Keras is a rarity in the world of AI open source projects: it promotes itself as “an API designed for human beings, not machines.” A Python deep-learning API, Keras interoperates with high- profile AI projects like Theano and Microsoft Cognitive Toolkit. Developers and AI engineers use it as a ML library to build prototypes with comparative ease. Also aiding its ease of deployment, Keras can run on a mix of processor hardware.
As the name suggests, Accord.NET uses the .NET framework. It’s a .NET ML learning framework that offers image and audio libraries coded in C#. It’s forward-looking, in that it offers a platform for developing commercial-level applications, including apps geared for signal processing, audio-visual toolsets and statistics apps. If you’re just getting your feet wet, Accord also includes template apps so you can start building faster.
Certainly, an open source AI technology that’s generating buzz, Generative Pre-Trained Transformer 2 (GPT-2) was released by OpenAI in 2019. GPT leverages a deep neural network, which uses numerous layers of software to process any number of inputs. GPT-2 is broadly known for handling text, from translation to creating text that, at its best, can be remarkably similar to that written by humans. Moreover, it’s a widely powerful learning tool that can synthesize and adapt to data with significant accuracy.
This project is useful if you’re a ML or AI developer who could use a helping hand with open source ML/AI projects. More of a learning tool than a project, Cheatsheets assists you in getting up to speed with AI/ML projects, from Keras to Scripy to PySpark to Dask. The instruction offered is in-depth and necessarily complex. While Cheatsheets AI is designed for “AI newbies,” in fact you will need some prior training to use this resource.
Is there a developer who doesn’t know TensorFlow? It’s practically a household name. Developed by the Google Brain team for internal use at Google, TensorFlow is now one of the most well-known open source machine learning platforms. Google is also making a cloud-based version of TensorFlow available for free to researchers.
Originally created by the bright minds at UC Berkeley, Caffe has become a very popular deep learning framework. Its claims to fame include expressive architecture, extensible code and speed.
With a huge user base, H2O claims to be “the world’s leading open source deep learning platform.” In addition to the Open Source version, the company also offers a Premium edition with paid support.
Microsoft Cognitive Toolkit
Clearly, Microsoft has moved into the world of open source. Formerly known as CNTK, the Microsoft Cognitive Toolkit promises to train deep-learning algorithms to think like the human brain. It boasts speed, scalability, commercial-grade quality and compatibility with C++ and Python. Microsoft uses it to power the AI features in Skype, Cortana and Bing.
Another very big name in AI and ML. Intended for use in AI research, DeepMind Lab is a 3D game environment. It was created by the DeepMind group at Google and is said to be especially good for deep reinforcement learning research.
Developed at Carnegie Mellon University, ACT-R is the name of both a theory of human cognition and software based on that theory. The software is based on Lisp, and extensive documentation is available. Operating Systems: Windows, Linux, macOS.
StarCraft II API Library
You didn’t think AI was all work, did you? Google’s DeepMind and Blizzard Entertainment are collaborating on a project that makes it possible to use the StarCraft II video game as an AI research platform. It’s a cross-platform C++ library for building scripted bots.
The Numenta organization offers numerous open source projects related to hierarchical temporal memory. Essentially, these projects attempt to create machine intelligence based on current biological understandings of the human neocortex.
A big ambition, to be sure: instead of focusing on a narrow aspect of AI such as deep learning or neural networks, Open Cog aims to create beneficial artificial general intelligence (AGI). The project is working toward creating systems and robots with the capacity for human-like intelligence.
This Java-based natural language processing software can identify the base forms of words, their parts of speech and whether they are names of companies or people, as well as normalizing dates and times. It marks up the structure of sentences in terms of phrases and syntactic dependencies, indicating which noun phrases refer to the same entities, identifying sentiment, extracting particular or open-class relations between entity mentions and getting quotes. It was designed for English but also supports a wide array of languages.
Developed and used by Facebook – yes, they have deep resources – Prophet forecasts time series data. It’s implemented in R or Python and is fully automatic, accurate, fast and tunable.
Originally an IBM Research project, SystemML is now a top-level Apache project. It describes itself as “an optimal workplace for machine learning using big data,” and it integrates with Spark.
Deep learning can be thought of as the furthest edge of AI. Theano, geared for deep learning, describes itself as “a Python library that allows you to define, optimize and evaluate mathematical expressions involving multi-dimensional arrays efficiently.” Key features include GPU support, integration with NumPy, efficient symbolic differentiation, dynamic C code generation and more.
Short for “Machine Learning Language Toolkit,” MALLET includes Java-based tools for statistical natural language processing, document classification, clustering, topic modeling, information extraction and more. It was first created in 2002 by faculty and graduate students at the University of Massachusetts Amherst and the University of Pennsylvania.
An example of cross-collaboration in the open source AI sector, DeepDetect has been used by organizations like Airbus and Microsoft. DeepDetect is an open source deep learning server based on Caffe, TensorFlow and XGBoost. It offers an easy-to-use API for image classification, object detection, and text and numerical data analysis.
Just like computer vision a few years ago, the decade-old field of natural language processing (NLP) is experiencing a fascinating renaissance. Not a month goes by without a new breakthrough! Indeed, thanks to the scalability and cost-efficiency of cloud-based infrastructure, researchers are finally able to train complex deep learning models on very large text datasets, in order to solve business problems such as question answering, sentence comparison, or text summarization.Read More
Today Camunda announced it has raised $100 million in a round led by Insight Partners. The raise included money from Highland Europe, which led the company’s $28 million round in 2018.Read More
Google today announced the release of Model Search, an open source platform designed to help researchers develop machine learning models efficiently and automatically. Instead of focusing on a specific domain, Google says that Model Search is domain-agnostic, making it capable of finding a model architecture that fits a dataset and problem while minimizing coding time and compute resources.Read More
Application performance monitoring startup Sentry today announced it has secured $60 million in series D financing for a post-money valuation of $1 billion. Sentry says the funds will fuel product development and go-to-market functions, as well as hiring across the company’s San Francisco, Toronto, and Vienna offices.Read More
Easy, Open-Source AutoML in Python with EvalML.Read More
n recent years, reinforcement learning (RL) has garnered much attention in the field of machine learning. The approach does not require labelled data and has yielded remarkable successes on a wide variety of specific tasks. RL unfortunately continues to struggle with issues such as sample efficiency, generalization, and transfer learning. To address these drawbacks, researchers have been exploring meta-reinforcement learning (meta-RL), in which learning strategies can quickly adapt to novel tasks by using experience gained on a large set of tasks that have a shared structure.Read More
Facebook AI has released a massive speech recognition database and training tool called Multilingual LibriSpeech (MLS) as an open-source data set. MLS combines more than 50,000 hours of audio in eight languages from public domain audiobooks with pre-trained language models and other data useful for automatic speech recognition development.Read More
By now, chances are you’ve read the famous paper about hidden technical debt by Sculley et al. from 2015. As a field, we have accepted that the actual share of Machine Learning is only a fraction of the work going into successful ML projects. The resulting complexity, especially in the transition to “live” environments, lead to large amounts of failed ML projects never reaching production.Read More
It’s needless to say we are living in the digital era. Nowadays our life is not as difficult as it was before the age of technological advancements. We can connect to the world in seconds. Social media and the Internet helps us a lot. All social media and messaging platforms are free. But nothing is truly free. If you are not paying for it, you’re not the customer; you’re the product being sold.Read More
A battle for control over machine learning operations (MLOps) is beginning in earnest as organizations embrace feature store repositories to build AI models more efficiently. A feature store is at its core a data warehouse through which developers of AI models can share and reuse the artifacts that make up an AI model as well as an entire AI model that might need to be modified or further extended. In concept, feature store repositories play a similar role as a Git repository does in enabling developers to build applications more efficiently by sharing and reusing code.Read More