Duke Energy used computer vision and robots to cut costs by $74M [with Video]

Duke Energy

Duke Energy’s AI journey began because the utility company had a business problem to solve,  Duke Energy chief information officer Bonnie Titone told VentureBeat’s head of AI content strategy Hari Sivaraman at the Transform 2021 virtual conference on Thursday.

Read More

15 free & open-source data resources for your next data science project

Resources and information

A consolidated list of free datasets organized by different categories for beginners as well as professionalsPhoto by Firmbee.com on UnsplashThere are many beginners in the field of data science since when the requirement of data scientists boosted in this pandemic. Most of the time, they have questions like where can I find datasets for machine learning/ deep learning projects? Where can I get free datasets for data science?So here I am writing a piece of useful information for every beginner from the very basic. I hope this article will be helpful to beginners as well as advanced data science professionals who were not familiar with these resources earlier.Data! Data! Data!Isn’t it everywhere but is it in ready-to-use form? Absolutely No. Before I take you guys through the list of resources that provide datasets almost around every type of field and of course for free. The very first thing that one should understand is that to apply data science skills, you first need to have a dataset in hand that too in a ready-to-use form.How to find the best datasets for specific machine-learning projects? Where to look at? Got Stuck..!! Spent a lot of time in search of it and ended up frustrated, then this short crisp guide is only for you to walk through some useful resources.If you ever worked on your own project from the scratch, you might be familiar with the obstacles you might have encountered during data collection. Data collection is the very first and most important step of all to get started in a Data-science project.There are three techniques to get a dataset :So, Below I am providing you with the top 15 useful Platforms from where you can get Datasets to get started with your journey in Data-Science.A complete list of dataset resources:-Kaggle → I am damn sure, many of you must be familiar with this one platform as it is very famous among data science people for lots of reasons. I put it here on top as well because I do use it most of the time. It is indeed helpful for the data science community as they have interesting datasets and cover almost every different aspect like health, finance, banking, education, and what not! If you ever wanted already prepared data in a structured form, make it your first choice to go to. The Kaggle datasets not only are open, accessible data formats better supported on the platform, they are also easier to work with for more people regardless of their tools. You can find a variety of file types as well, like CSVs, JSON, SQLite, BigQuery2. UCI ML repo → The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. They currently have 588 open source datasets for data science as a service to the machine learning community and have helpful datasets for machine learning projects. You can view all the UCI datasets through their searchable interface. While doing my master’s, I was unable to find any free datasets for one of my projects. Then, one of my faculty suggested I use this platform, and it did great. I found a really good and big UCI dataset to start my project. I put it in second place in the list as the UCI datasets are pretty well organized. You can even search using a type of task you are attempting like regression or classification or NLP.3. Quandl → A resource provides free datasets for the data science community which is designed for professionals and delivers financial, economic, and alternative data to people worldwide. They cover almost two types of data, time series, and tables. One can find interesting datasets for finance, economics here. In these areas, you can find pretty good datasets for machine learning projects. All you need to do is insert keywords of your interest in a search bar and choose from the results of listed datasets.To use the particular dataset, you have to visit its usage tab. Since I cannot cover everything here, please visit this documentation to fully understand how can you use this platform for completing your data requirements using APIs. Other than APIs, you can also call any financial data of your need using python library directly into your python IDE, to understand how to do that, please refer to this documentation.4. Data.gov → This particular site is maintained by the U.S. government and only they decide what to put out there as public, free datasets to be used by researchers and data science people like me and you. Here you can find free datasets arranged by different categories like agriculture, climate, energy, ocean, local government, maritime, older adults health. So if any of the listed categories interest you in any way, you can visit this platform to get the free datasets for machine learning projects. Just visit the site URL and go to the Data tab on top. This will list all the datasets. You can search as well around interested keywords. While exploring this site, I have found very good resources as well. So don’t forget to go through them once.5. Data.gov.in → This particular site is maintained by the Indian government. They release all sorts of data in almost every domain like education, finance, healthcare, and many more so that researchers like us can use it and develop some useful projects on it. You can use these free datasets for data science projects of course. You can also find image datasets for machine learning/deep learning projects here. The interesting datasets shared on this website have projects built for DRDO, ISRO like organizations. So the sensitivity is high of such free datasets in terms of correct usage.Accessing this particular is very simple:Just enter the keywords and search e.g., EducationClick on the relevant search result.The above step will take you to the catalog containing that dataset.Go through pages in the catalog, to find the right dataset.Extract the dataset in the required format.The site will ask some basic questions for using purpose, answer them and save the data.6. World bank data → The only website which works closely with bank regions and global practices to get high-level good statistical data and they maintains several macros, financial, and sector databases. They don’t compromise on the quality and quantity of data in any aspect as the goal of a world without poverty is essential to them. So if this option excites you as well, please refer to this sufficient documentation on how to download the dataset from this particular site. This documentation will help you in better understanding the site as the datasets are distributed across different sections like DataBank, Microdata, Data Catalog.7. grouplens → So basically grouplens is a research lab in the department of computer science situated at the University of Minnesota. They have developed some end-to-end data science projects like movielens, local geographic information systems, digital libraries, cyclopath, booklens. They also give access to some of the free datasets that they have acquired from research and surveys. If any of the listed projects interest you, please visit their datasets tab on the top of the site to see what is available for use and what is not.8. RBI → Reserve Bank of India has put some free data out there. If you also want to analyze money market operations, payment flows, use of banking then this site is a must to go for finding the right dataset for your next data science project. The datasets are organized by the way of collection, whether daily, weekly, monthly, etc. This site would be really helpful to perform some time series projects.9. Github repo for public datasets → An awesome public dataset repo is the repository that I found useful on Github. This repository is wonderful containing some high-quality free datasets and not to mention very well organized for different domains. Do visit this repo and don’t forget to share with me, whether you find it useful or not?10. Fivethirtyeight → is a site that writes interactive articles and makes graphics on topics ranging from Politics to Sports, Economics, Culture, and Science & health. They provide some analytical stories retrieved from a variety of open-source datasets. You can have access to the free datasets from this link, all you have to do is download the data of your interest.Another available option is their GitHub repo for accessing their interesting datasets as well as code behind creating visualizations and interactive stories.11. Data.world → This site is very handy not just for data science people but also for non-technical people who just want to get insights. People like journalists, business people to get clear, accurate, fast answers to any business question. They have a pretty good organization for free datasets in the data catalog to discover, govern, and for easy access. This is the link to the Finance directory of the open datasets present on data.world. You may choose some other category of your choice.12. Google Dataset Search → Unlike other sites, It is a search engine which is invented for finding free datasets. Google Dataset Search works just like regular google search does base on keywords provided. It matches the keywords with the description of the dataset instead of the content. On entering the specific keywords, there are pretty good chances of finding a dataset if it is publicly available. At the time of the launch, Google Dataset Search had almost 25 million different free datasets from across the globe.13. Open ML → OpenML is an open data science platform that is actually meant for machine learning research. The platform is pretty neat and clean, have all sections organized. You can find free datasets from a variety of domains like healthcare, education, climate change, politics, sports, and whatnot. Every interesting dataset can be downloaded in multiple formats like CSV, JSON, XML, etc and there is a separate page for each dataset on this site.Just visit the dataset of your choice and download it in the required formats by choosing on the top right corner.You can use this platform to perform your machine learning tasks as well and also you can take help from other’s tasks on the site and those models built by you can be shared with others so that they can also use it.14. BuzzFeed News → It is an American news website that features analytical stories. It has open-sourced its datasets, libraries, and tools they use, data and analyses, some guides for your easiness on their Github repo.15. National Center for Environmental Information → This one is the best bet if you are looking for some data related to weather and environmental conditions. NCEI is the largest repository of environmental data in the world. It has datasets related to the climatic & weather conditions across the United States, oceanic data, meteorological data, climatic conditions, geophysical data, atmospheric information, etc.ConclusionThe above-consolidated list of free dataset resources is the list starts with very well-known resources to some underrated resources, the resources which are not very popular in the data science community.I tried my best to include the best as possible because I know the struggle of not getting the right data. Also, it is very time-consuming and tedious work if you go for scraping every time. Almost all the listed data aggregator sites host open datasets.If you are planning to work on any data science project, I hope this list will help you with your first step of getting the right data. If it helps you in any way, don’t forget to comment and let me and others know. If you like the list and think it will gonna help other people as well, please applaud the article so that it can reach out to the needy ones.Cheers and best wishes to all of you !!!

Read More

Mimecast’s new AI tools protect against the sneakiest phishing attacks

Phishing email

Email security provider Mimecast this week launched Mimecast CyberGraph, an AI-driven add-on to Mimecast Secure Email Gateway (SEG) that sniffs out sophisticated and hard-to-detect phishing and impersonation threats, the company said.

Read More

[Book Review] The Future of Violence: Robots and Germs, Hackers and Drones- Confronting the New Age of Threat

Being futuristic is often stressful and the book The Future of Violence makes it easier to see why is that so. The book raises fascinating, yet thoughtful questions about the growth and easy accessibility of emerging technologies that possesses immense threat and could eventually change the preconceived notion of violence. Written by Benjamin Wittes, a senior fellow at the Brookings Institution, and Gabriella Blum, a professor at Harvard Law School, the book raises fundamental questions about the proliferation of modern biotechnology, networked computers, robotics and so forth, whom the authors collectively term as ‘technologies of mass empowerment’.

Read More

[Paper Summary] DeepMind introduces it’s Supermodel AI ‘Perceiver’: a Neural Network Model that could process all types of input

DeepMind recently released a state-of-the-art deep learning model called Perceiver via a recent paper. It adapts the Transformer to let it consume all the types of input ranging from audio to images and perform different tasks, such as image recognition, for which particular kinds of neural networks are generally developed. It works very similarly to how the human brain perceives multi-modal input.

Read More

CPU vs GPU in Machine Learning Algorithms: Which is Better?


Machine learning algorithms are developed and deployed using both CPU and GPU. Both have their own distinct properties, and none can be favored above the other. However, it’s critical to understand which one should be utilized based on your needs, such as speed, cost, and power usage.

Read More

[Video Highlights] A path into Data Science

Laptop in the dark

Are you interested in getting ahead in data science? On this TalkPython podcast episode, you’ll meet Sanyam Bhutani who studied computer science but found his education didn’t prepare him for getting a data science-focused job. That’s where he started his own path of self-education and advancement. Now he’s working at an AI startup and ranking high on Kaggle.

Read More

Natural language understanding tough for neural networks


One of the dominant trends of artificial intelligence in the past decade has been to solve problems by creating ever-larger deep learning models. And nowhere is this trend more evident than in natural language processing, one of the most challenging areas of AI. In recent years, researchers have shown that adding parameters to neural networks improves their performance on language tasks. However, the fundamental problem of understanding language—the iceberg lying under words and sentences—remains unsolved.

Read More

Edge computing for oil and gas companies

Oil and Gas

The Oil and Gas (O&G) industry is currently undergoing a transition as the world desires to move towards clean-renewable energy, coupled with the uncertainty around global demand. Eventually, this will force organizations to reduce their operational & capital expenditure. Over the years, the oil and gas industry has relied heavily on its own telecommunication network. The communications with vessels or sites are made via satellite links that are expensive and difficult to manage remotely. Often, operational complicacies arise due to high data latency, which can have a high impact on an O&G plant.

Read More

Top books to read on Data Science this month

With data science making it into the list of the most promising careers, a lot of people have started to show interest in the same. In order to stay ahead in the race, having sound knowledge about data science is the key. No wonder, there are numerous ways in which one can gain knowledge. Here are some of the top books to read on data science that would help you stand apart from the rest

Read More

[Paper Summary] Skoltech Researchers present a Machine Learning Framework involving Convolutional Neural Networks

Skoltech researchers and their partners in the U.S. have created a neural network that can help tweak semiconductor crystals to achieve superior properties for electronics. This is an exciting new direction of development with limitless possibilities for next-generation chips and solar cells. This study is published as a paper in the journal npj Computational Materials.

Read More