The Machine Learning soccer pundit

Recently, in December, the 2020 season of Major League Soccer ended after a tough year. Although I have a distant relationship with the sport nowadays, the subject ends up being very natural to me as a Brazilian. Since I returned to live in the USA, I started to be interested in the soccer growth in the country, seeing great potential as a sport and as a business. Reading about the finals, I realized that I know little about the players and decided to explore a little, to learn more about the league, its teams, fans, and players.

Read More

Dremio raises $135 million to help companies rapidly analyze data

Transform 2021
Join us for the world’s leading event about accelerating enterprise transformation with AI and Data, for enterprise technology decision-makers, presented by the #1 publisher in AI and Data
Learn More

Dremio, a startup offering tools to help streamline and curate data, today announced that it raised $135 million in series D funding at a post-money valuation of $1 billion. The company says it’ll use the funds, which come nine months after a $70 million round, to invest in cloud data lake technologies that could benefit businesses looking to connect, analyze, and process data while accelerating database queries. Specifically, Dremio plans to expand its engineering centers of excellence and grow its customer-facing organizations to keep pace with new customer acquisitions.
Due to its scalability, low cost, and simplicity of management, cloud data lake storage has become the destination of choice for storing high volumes of data. According to a recent Allied Market Research report, the global data warehousing market size was valued at $18.61 billion in 2017, growing at a compound annual growth rate of 8.2% from 2018 to 2025. However, to audit that data, it has to be moved and copied into proprietary data warehouses, a process that can become costly, complex, and inflexible.

MapR veterans Jacques Nadeau and Tomer Shiran founded Santa Clara, California-based Dremio in 2015 to solve this challenge. CEO Billy Bosworth tells VentureBeat that Tomer, a former product manager at Microsoft who’s held engineering and research roles at IBM and HP, saw the rise of public clouds like Amazon Web Services, Microsoft Azure, and Google Cloud Platform as an opportunity to reinvent big data technology and develop a cloud data lake engine, enabling companies with large storage volumes to rapidly analyze their data.
“Dremio customers are running millions of queries per day for high concurrency BI with tools like Tableau and Power BI, ad-hoc data processing, and mission-critical dashboards. This is made possible by fundamentally simplifying the workflow for data engineers who are already centralizing data from many sources into cloud stores like AWS S3 and Microsoft ADLS,” Bosworth said in an email interview with VentureBeat. “With Dremio, that data does not need to be further moved or copied into data warehouses for analytics; instead, the full data set is available directly in native cloud storage.”
Dremio offers a virtualization toolkit that bridges the gaps among relational databases, Hadoop, NoSQL, ElasticSearch, and other data stores, connecting to business intelligence software as if it were a primary data source and querying it via SQL. (SQL is the domain-specific language designed for stream processing and managing data held in a relational database management system.) The startup’s eponymous platform maintains a catalog of sources, physical and virtual datasets, and datasets’ lineage, making it easier to search and find datasets and see how data are being transformed.

Above: A few of the data sources Dremio’s platform supports.
Image Credit: Dremio

Dremio is available in an open source Community edition as well as a commercial Enterprise edition. It runs in the cloud via Kubernetes or in a Hadoop cluster, and subscription pricing scales based on the number of nodes to which Dremio is deployed.
Joining capabilities native to Dremio enable data lakes to benefit from other stores, including Oracle, SQL Server, and PostgreSQL databases. And Dremio automatically detects schemas and supports cloud data lakes in Amazon S3 and other cloud storage providers, leveraging the Apache Arrow data structure to speed up performance by 1,000 times, the company claims.
Thanks to features like automatic failover, Dremio can automatically select new nodes in the event of node and instance cluster failures. The platform’s dynamic access, moreover, delivers programmatic security controls through integration with Kerberos, LDAP, and other centralized providers.
On the AI side of the equation, Dremio taps machine learning to recommend datasets to users and adapt catalogs in response to changes in schema and execution. It also algorithmically caches and indexes metadata as needed, in real time and on the fly.
Asked whether the pandemic has affected business, Bosworth said it hadn’t, pointing to Dremio’s 60% growth in headcount since March. Other than a delayed sales cycle when the startup’s customers transitioned to working from home, Dremio weathered the storm well, growing its customer base to 100 companies — a majority of which are from the Forbes Global 2000 — with over 75,000 users.
“Data analytics has always been important to our customers. This year, it has become more imperative than ever as we navigate this pandemic,” Bosworth said. “Dremio was already a distributed company, so we did not experience any loss of productivity.”
Dremio’s series D round announced today was led by Sapphire Ventures and included participation from existing Dremio investors Insight Partners, Lightspeed Ventures, Norwest Venture Partners, Redpoint Ventures, and Cisco Investments. As of today, the company has about 160 employees — a number it expects will double by the end of 2021 — and has raised $247 million in venture capital.

VentureBeat
VentureBeat’s mission is to be a digital townsquare for technical decision makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
up-to-date information on the subjects of interest to you,
our newsletters
gated thought-leader content and discounted access to our prized events, such as Transform
networking features, and more.
Become a member

Read More

Mythbusting the analytics journey

This isn’t your typical recruiting story. I wasn’t actively looking for a new job and Netflix was the only place I applied. I didn’t know anyone who worked there and just submitted my resume through the Jobs page 🤷🏼‍♀️ . I wasn’t even entirely sure what the right role fit would be and originally applied for a different position, before being redirected to the Analytics Engineer role. So if you find yourself in a similar situation, don’t be discouraged!

Read More

SingleStore raises $80 million to accelerate data analytics with relational databases

Relational database startup SingleStore (previously MemSQL) closed an $80 million funding round today, bringing its total raised to $238 million. The San Francisco-based company plans to use the funds to increase its market presence; expand its engineering team in Portugal, Ukraine, and the U.S.; and grow its customer base internationally.

Read More

ActivTrak raises $50 million to measure remote workers’ productivity

ActivTrak, an Austin, Texas-based cloud productivity monitoring software provider, today raised $50 million in a series B round from Sapphire Ventures. The fresh capital will be used to scale ActivTrak’s go-to-market activities across sales, marketing, and channels and expand the company’s capabilities using AI-driven analytics, according to CEO Rita Selvaggi.

Read More

Algorithmia announces Insights for ML model performance monitoring

Seattle-based Algorithmia has announced Insights, a solution for monitoring the performance of machine learning models. Algorithmia specialises in artificial intelligence operations and management. The company is backed by Google LLC and focuses on simplifying AI projects for enterprises just getting started. Diego Oppenheimer, CEO of Algorithmia, says: “Organisations have specific needs when it comes to…

Read More

How to create an authentic Data Science Project for your Portfolio

As an inspiring data scientist, building interesting portfolio projects is key to showcase your skills. When I learned coding and data science as a business student through online courses, I disliked that datasets were made up of fake data or were solved before like Boston House Prices or the Titanic dataset on Kaggle.

Read More

Key aspects of Machine Learning operations, explained

Machine Learning Operations

Until 2015, even professional programmers didn’t consider machine learning has real potential and benefits. However, with innovation the development of AI and computing capabilities build-up, autonomous MLOps platforms began to develop rapidly and became an integral part of computer systems development.

Read More