“Above the Trend Line” – Your Industry Rumor Central for 9/24/2021

Above the Trend Line

Variety of short time-critical news items grouped by category such as M&A activity, people movements, funding news, industry partnerships, customer wins, rumors and general scuttlebutt floating around the big data, data science and machine learning industries including behind-the-scenes anecdotes and curious buzz.

Read More

A Guide for Optimizing your Data Science Workflow

It is a good idea to use pipx to install flake8 and mypy in your system. This way, you can reuse them across projects and only install them once. You can point to the pipx install location using the following setting in the user settings.json file,{“python.linting.flake8Enabled”: true,“python.linting.flake8Path”: “C:\Users\username\.local\pipx\venvs\flake8\Scripts\flake8.exe”,“python.linting.mypyEnabled”: true,“python.linting.mypyPath”: “C:\Users\username\.local\pipx\venvs\mypy\Scripts\mypy.exe”}With linting enabled through mypy, flake8, and pylance you can safely write code and catch bugs even during prototyping.FormattingBlackFormatting helps maintain code formatting standards when working in a team. Have you had teammates debate about code formatting over a PR? Black is a python tool that automates your code formatting using a set of predefined rules. It is an opinionated auto-formatting library. Black is particularly helpful during development as it can break down complex statements in a format that is easy to read. The reformatting is deterministic so that users with the same setting can get the same exact formatting, no matter the OS, IDE, or platform the formatting is run on. You can use pipx to set up black on your machine.Before black formatting,def very_important_function(template: str, *variables, file: os.PathLike, engine: str, header: bool = True, debug: bool = False):”””Applies `variables` to the `template` and writes to `file`.”””with open(file, ‘w’) as f:…After black formatting,def very_important_function(template: str,*variables,file: os.PathLike,engine: str,header: bool = True,debug: bool = False,):”””Applies `variables` to the `template` and writes to `file`.”””with open(file, “w”) as f:…I-sortBlack doesn’t format your imports. Users randomly import stuff into the project without any order. i-sort (import sort) provides some order into this chaos by providing a hierarchy in imports. It formats the imports in such a way that it is python standard library imports followed by third-party library imports and then user-defined library imports. In each category, the imports are further sorted in ascending order. This helps in identifying imports quickly when there is a bunch of them.Before i-sort,from my_lib import Objectimport osfrom my_lib import Object3from my_lib import Object2import sysfrom third_party import lib15, lib1, lib2, lib3, lib4, lib5, lib6, lib7, lib8, lib9, lib10, lib11, lib12, lib13, lib14import sysfrom __future__ import absolute_importfrom third_party import lib3print(“Hey”)print(“yo”)After i-sort,from __future__ import absolute_import# Python Standard libraryimport osimport sys# Third party libraryfrom third_party import (lib1, lib2, lib3, lib4, lib5, lib6, lib7, lib8,lib9, lib10, lib11, lib12, lib13, lib14, lib15)# User defined libarary/modulesfrom my_lib import Object, Object2, Object3print(“Hey”)print(“yo”)Use this setting in your setup.cfg file to configure i-sort to work with black. Black by default is configured for a 88 character line. Make sure that flake8 and i-sort are also configured to the same exact setting.[flake8]max-line-length = 88[isort]line_length = 88Image by AuthorTable of Contents:DotenvPre-commitTouch typingVIMVS Code extensionsThis is a very crucial step since this is where abstractions such as functions, class, and modules are designed. Data scientists can learn a lot from a python developer’s workflow during this step.DotenvLet’s say you have the following file structure,dream_ds_project–dev # Jupyter notebook folder–notebook1.py–notebook2.py–src # Source code folder–module1.py–module2.py–.env # Environment variable file–setup.cfg # Configuration file for python toolsAnd you start by prototyping a function test_prototypein dev/notebook1.py. You can then move that function to src/module1.py. Now when you have to import this function, set up PYTHONPATH in the .env file present on the root folder, like this.# Set pythonpath to a relative path. In this case it sets where .env # file is present as the root pathPYTHONPATH=.Now you can import test_prototype in dev/notebook1.py asfrom src.module1 import test_prototype.env is a special file. You can use it to store sensitive information such as passwords, keys, etc. This should not be part of git commits and should be kept private. Keep two .env files, one for production and one for development.Production .env file could be like,MONGODB_USER=prod_userMONGODB_PWD=prod_pwdMONGODB_SERVER=prod.server.comWhereas Development .env file could be like,MONGODB_USER=dev_userMONGODB_PWD=dev_pwdMONGODB_SERVER=dev.server.comYou can load these variables into your environment using the python-dotenv library. Inside your python code, you can access these variables like this,from dotenv import load_dotenvimport os# Call the function to read and load the .env file into local envload_dotenv()print(os.getenv(“MONGODB_SERVER”)) > >prod.server.com # For prod .env file > >dev.server.com # For dev .env fileThis helps in keeping the code common for prod and dev and just replace the .env file based on the environment in which the code is running.Pre-commitPre-commit helps in verifying your git commits. It helps in maintaining a clean git commit history and provides a mechanism for doing user-defined validations before each commit. It has a strong ecosystem and has a plugin for most of the common commit validations that you can think of.Builtin-hooks: Some of my favorite builtin pre-commit hooks are,detect-aws-credentials & detect-private-key which makes sure there is no accidental sensitive information included in the commits.check-added-large-files to make sure commits do not include file sizes that exceed 1MB, which can be controlled using maxkb argument. I found this very useful because code files are rarely larger than 1MB and this prevents accidental commits of large data files in a data science workflow.check-ast which makes sure that the code is syntactically valid python code.Install pre-commit, poetry add pre-commit — devCreate a .pre-commit-config.yamlfile and add this,repos:- repo: https://github.com/pre-commit/pre-commit-hooksrev: v3.2.0hooks:- id: detect-aws-credentials- id: detect-private-key- id: check-added-large-filesargs: [‘–maxkb=1000’]- id: check-astPlugins:On top of built-in hooks, pre-commit offers support for plugins as well. Some of my favorite plugins are,Black makes sure that the formatting of all the commit files follows black conventions.Mypy validates that the static type check has no errors.Flake8 ensures the coding standards are observed.pytest makes sure all the tests are passing before committing. This is particularly useful for small projects, where you do not have a CI/CD setup and testing can be done locally.- repo: https://github.com/psf/blackrev: 20.8b1hooks:- id: blackargs: [‘–check’]- repo: https://github.com/pycqa/isortrev: ‘5.6.3’hooks:- id: isortargs: [‘–profile’, ‘black’, ‘–check-only’]- repo: https://github.com/pre-commit/mirrors-mypyrev: v0.800hooks:- id: mypy- repo: https://gitlab.com/pycqa/flake8rev: ‘3.8.3’hooks:- id: flake8args: [‘–config=setup.cfg’]- repo: localhooks:- id: pytest-checkname: pytest-checkentry: pytestlanguage: systempass_filenames: falsealways_run: truePre-commit only reads the files and validates the commit, it never performs formatting or any write operation on the files. In case of a validation error, it cancels the commit and you can go back and fix the error before committing again.A sample pre-commit failure because of committing a large data file,dream_ds_project > git commit -m “precommit example – failure”Detect AWS Credentials……………………………………………PassedDetect Private Key……………………………………………….PassedCheck for added large files……………………………………….Failed- hook id: check-added-large-files- exit code: 1all_data.json (18317 KB) exceeds 1000 KB.Check python ast…………………….(no files to check)Skippedblack………………………………(no files to check)Skippedmypy……………………………….(no files to check)Skippedflake8……………………………..(no files to check)SkippedA sample pre-commit success,dream_ds_project > git commit -m “precommit example — success”Detect AWS Credentials……………………………………………PassedDetect Private Key……………………………………………….PassedCheck for added large files……………………………………….PassedCheck python ast…………………………………………………Passedblack…………………………………………………………..Passedmypy……………………………………………………………Passedflake8………………………………………………………….Passed[master] precommit example — success7 files changed, 54 insertions(+), 33 deletions(-)Touch typingTouch typing is an essential productivity tip that is generalizable to any computer task. It is vital if you spend a considerable amount of time in front of a computer. Just practicing for a few minutes every day, you can reach significant typing speed.Touch typing is vital for programmers since the emphasis is a lot more on typing special characters, and you don’t want to concentrate on your keyboard when you are thinking about logic. Keybr is a fantastic website for training touch typing.

Read More

Understanding the role and attributes of Data Access Governance in Data Science & Analytics

Data access governance

Data scientists and business analysts need to not only find answers to their questions by querying data in various repositories, but also transform it in order to build sophisticated analysis and models. Read and write operations are at the heart of the data science process and are essential to helping them make quick and highly informed decision-making. It is also an imperative capability for data infrastructure teams that are tasked with democratizing data while complying with privacy and industry regulations.

Read More

Relyance emerges from stealth to spot risky code

Relyance AI

Relyance, a San Francisco, California-based startup developing a real-time codebase analysis platform, today emerged from stealth with $30 million raised across seed and series A rounds from Unusual Ventures and Menlo Ventures. Co-CEOs Leila R. Golchehreh and Abhi Sharma say the funding will be used to expand the company’s engineering and sales teams as well as accelerate Relyance’s go-to-market strategy.

Read More

Data science hasn’t fixed its huge gender pay gap

AI Diversity

While 64% of employees in data science, AI, and machine learning took part in training or obtained new certifications over the past year, the average change in compensation was $9,252 — an increase of about 2.25% annually. That’s one of the findings in O’Reilly’s 2021 Data/AI Salary Survey, which took a look at job satisfaction and salaries in data science fields experiencing a shortage of qualified employees. The results suggest that data and AI professionals are among the most driven employees when it comes to upskilling — and have a clear desire to learn.

Read More

Best of arXiv.org for AI, Machine Learning, and Deep Learning – August 2021

In this recurring monthly feature, we filter recent research papers appearing on the arXiv.org preprint server for compelling subjects relating to AI, machine learning and deep learning – from disciplines including statistics, mathematics and computer science – and provide you with a useful “best of” list for the past month.

Read More

Agile Data Science: What does it mean? How to manage Data Science Projects


“What does agile data science mean?” you might be asking. In one word: agile! Agile is a methodology that has been embraced by many industries, including data science. It’s time to get agile with your data science projects and start increasing efficiency and decreasing costs. This blog post talks about what agile data science is, how it can help you manage your projects better, and tips around how it can be used in the context of your company’s culture.

Read More

Artificial Intelligence in Air Quality Control

Air quality

The earth is composed of a mixture of gases that encompass the atmosphere around us. Air is one of the most essential constituents that serve to preserve all life forms on earth. The air we breathe contains about 21% oxygen, which is utilized by the human body. Since we continuously breathe air for survival, it becomes critical to maintain the balance and quality of the air around us. However, with the pollution surrounding us, it becomes difficult to breathe in the natural air available.

Read More

Overcoming AI’s transparency paradox

Transparency paradox

AI has a well-documented but poorly understood transparency problem. 51% of business executives report that AI transparency and ethics are important for their business, and not surprisingly, 41% of senior executives state that they have suspended the deployment of an AI tool because of a potential ethical issue. In order to fully understand why AI transparency is such a challenging issue, we first ought to reconcile with some common misconceptions and their realities within AI transparency to gain a better view of the problem and the best way to address transparency within the context of the current ML tools in the market. 

Read More

A maturity model for AI

AI maturity model

Determining the position of your company in its AI journeyJannik KlaukeJust now·6 min read📸 — StatworxAs digitization is becoming the norm in most organizations, the next era of transformation, namely the integration of AI into the day-to-day business has arrived.In a survey of more than 1800 executives in 2019, 30% have already implemented an AI strategy that aligns with their corporate goals. 63% of respondents reported revenue increases yielding from AI adoption.[1]To achieve a successful AI adaptation, a first thing to do is to determine the current level of AI maturity in the company. Only if you know where you currently stand, and what is possible, you can derive what actions you have to take to go there, and which steps and phases to pass along the way. Thereby, questions are answered such as:· What is the position of the company in the application of intelligent technologies and AI capabilities?· How will the application of AI be useful in the future? (short, medium and long term)· Which products, services and internal processes are affected by AI?· And most importantly, how do we need to start to determine the degree of maturity?To answer the last question, we have developed the following model to help determine the level of AI maturity in a company. It can also be applied on a more granular basis within the company such as on a departmental or divisional level. AI maturity is classified into five distinct maturity levels: aware, ad-hoc, opportunistic, integrated, or transformative. To achieve that, we have identified six dimensions that will help managers derive concrete measures to determine their AI maturity level before increasing it and successfully lead the AI front of tomorrow.Six maturity dimensionssix dimensions of AI maturity — image by authorDimension 1: DataData is the foundation for the successful application and scaling of AI technology in a company. Due to many common issues, such as overwhelming data accumulation, data quality and frequency of data input, data is the most significant operational challenge for many organizations instead of being considered as an important strategic asset. Access and quality of organizational data is required to realize the full potential of AI solutions. During AI maturity identification, a holistic view on the organizational data is essential.Dimension 2: Use CasesThe ability to identify and develop potential value-adding use cases for artificial intelligence is crucial. In order to do that, it is important to be aware of the limitations and possibilities of AI and transfer that knowledge to practical business problems. Processes for identifying and prioritizing AI use cases based on feasibility and success criteria are important in the scaling phases when various departments are utilizing AI for their business problems.Dimension 3: Team and SkillsA successful build-up of AI capabilities in a company is highly dependent on internal AI know-how and related skills. It is thus important to have the appropriate AI talents available before implementing AI. Building maturity in this dimension is vast and can range from meaningful job advertisements to suitable training concepts for upskilling internal employees.Dimension 4: InfrastructureIT infrastructure forms the technical foundation for the development of AI applications. AI experts need to have the right tools to develop their applications and the ability to make them accessible throughout the organization. Cloud technology adaptation is also becoming an important indicator in determining the level of AI maturity. AI projects can not only be deployed in the cloud, many large cloud providers also offer AI services that can be leveraged.Dimension 5: GovernanceVarious factors play a role in the process of undergoing comprehensive governance in AI projects.Risks involving AI must be identified and compliance to regulations as well as internal policies must be monitored in the solutions. After KPIs and metrics have been agreed upon, frequent reports should be made to ensure successful steering of AI initiatives. Other factors that are important in this context are AI ethics and explainability of AI.Dimension 6: OrganizationThe benefits that AI teams generate depend often on how well they are integrated at the organizational level and if the organizational conditions are adapted to it. The adaptation of internal processes to an agile, AI-oriented approach and the change in the mindset of the culture also contributes to the initiative’s success.Solution: AI MaturityWe at STATWORX GmbH have developed a model to capture the current status of AI maturity in a company or department. Based on practice experience and current research we defined five levels of AI maturity:five levels of AI maturity in a company — image by authorLevel 0: AwarenessCompanies in the state of awareness, have not implemented AI yet. The company is aware of the existence of AI and potentials might are known. Nonetheless, the touching points with AI go no further than that. Analysis is done manually without any intelligent tools and data storing methods at all.Level 1: Ad-hocIn this stage, AI adoption is inactive and not considered in the corporate strategy. Knowledge and AI awareness of AI technology are scarce, and data is mainly stored in fragmented systems. Planning and decision making are rarely data-based along with simple training initiatives. Many organizations are at this level or transitioning to the next level of AI maturity.Level 2: OpportunisticIn the second state of the AI maturity model, companies have identified that AI is an important future topic and have taken the first steps to explore the potential of AI. A central platform for AI in the organization does not exist at this stage. Tools and know-how are only available at an operational level to a certain extent. A few stakeholders push for AI initiatives and in some cases, AI models have made it to near production stages at a departmental level.Level 3: IntegrationIn this level, AI is applied in most areas of the company and has been integrated into existing products, services or processes. AI has become a standard technology within the organization and rules for governance of AI models are defined and followed. AI also serves as a basis for decision-making and is centrally managed.Level 4: TransformativeIn the last stage of AI maturity, AI is part of the business model and is firmly embedded into the organization and corporate strategy. At this stage, own products based on AI may be marketed and AI initiatives have been successfully implemented and interconnected within most divisions. AI competencies and training are widespread and have been systematically built up and cultivated. Data is given a high priority and is seen as both raw material and product. AI is also fully exploited in compliance to regulatory and ethical standards. Teams have central access to data and AI solutions are managed in the organization’s own AI platform.Increasing AI MaturityOnce the organization’s AI maturity level has been identified, the next important question arises: How to increase the AI maturity? Take the following three steps and implement the outcome to boost your AI maturity.increasing AI maturity — image by authorConclusionAn AI strategy sets the stage for a systematic implementation of AI within the organization. All measures to increase the level of AI maturity are pointed towards the dimensions of the organization’s holistic AI strategy and have to be aligned with the general strategic direction of the organization to avoid conflict of interest. By formulating an AI strategy, the increase of the maturity level can be accelerated significantly.You should use this model t determine your level of AI maturity both for the entire organization or individual departments. When increasing AI maturity level, steps should be taken to reduce AI weaknesses and concrete defined action steps should serve as an action plan.Next StepsI hope that this brief introduction to the fundamental dimensions of an AI strategy was helpful to you. Let me know your thoughts and ideas!If you want to learn more about the five levels of maturity and how to advance in your AI journey, you can find our whitepaper on the topic here.References:[1]: https://www.mckinsey.com/featured-insights/artificial-intelligence/global-ai-survey-ai-proves-its-worth-but-few-scale-impact#

Read More

[Report] 300 Data Science Leaders share what’s holding their Teams back

Holding data science back

The world’s most sophisticated companies overwhelmingly count on data science as a key driver for their long-term success. But according to a new survey of 300 data science executives at companies with more than $1 billion in annual revenue, flawed investments in people, processes, and tools are causing failure to scale data science.

Read More

“Human in the loop” is a popular way to mitigate the risks of AI. That approach might be doomed

Human and robot

I’ve written in the past about how fuzzy the line between “good” and “evil” data science and artificial intelligence (AI) can be. Ethical issues arise with AI that are neither clear cut nor easy to navigate. One of the popular ways to mitigate the risks of AI is to pursue a “human in the loop” strategy where people still retain ultimate decision authority for major decisions. In this blog, I’ll explain why that approach may be doomed for failure as a primary tool for stopping “evil” AI from being deployed.

Read More
1 2 3 40