[Book Review] Analytics of Life by Mert Damlapinar

Analytics of Life: Making Sense of Data Analytics, Machine Learning & Artificial Intelligence is a work of non-fiction in the business and technology sub-genres, and was penned by author Mert Damlapinar. As the title and subtitle suggest, the subject matter of the work concerns the growing field and practices of data analytics.

Read More

[Podcast] Natalie Monbiot of hour one on Virtual Humans Automated Video Production and more

Natalie Monbiot

Natalie Monbiot is head of strategy and business development for Hour One. She is working with companies across a number of industries as varied as real estate, eLearning, automotive, and consumer brands and is shaping Hour One’s go-to-market and growth strategies.

Read More

[Book] Five Reasoning Methods to rule them all

The five reasoning methods are also called the five tribes. They help to solve the Master Algorithm. Each of the five tribes has a different technique and strategy for solving problems that result in unique algorithms. If we are successful to combine these algorithms, then it will lead us to (theoretically) the master algorithm. These are defined by the Portugues author, Pedro Domingos in his book The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World.

Read More

[Podcast] Yakir Buskilla CEO of CoCoHub talks Chatbots and Virtual Humans

Yakir Buskilla - CEO Cocohub

Buskilla took over as CEO after several years at Nielsen where he oversaw the company’s R&D operation in Israel. Earlier, he co-founded Semanix which was focused on improving patent search technology. He has a background in Machine Learning, Big Data, cloud hosting, and back-end software development and began his career as a software engineer.

Read More

Best of arXiv.org for AI, Machine Learning, and Deep Learning – June 2021

In this recurring monthly feature, we filter recent research papers appearing on the arXiv.org preprint server for compelling subjects relating to AI, machine learning and deep learning – from disciplines including statistics, mathematics and computer science – and provide you with a useful “best of” list for the past month. Researchers from all over the world contribute to this repository as a prelude to the peer review process for publication in traditional journals.

Read More

Top 10 Natural Language Processing (NLP) Tools in 2021

Natural Language Processing is the fastest-growing subset of AI that applies linguistics and computer science to make human language understandable to machines. There are new advancements every year. New tools of NLP are evolving and the old ones are being updated with more developed features. Before going with the top 10 NLP tools services, it is important to mention that all the tools are either recently released or are upgraded with new features. The tools named below are free and open-source instruments.

Read More

15 free & open-source data resources for your next data science project

Resources and information

A consolidated list of free datasets organized by different categories for beginners as well as professionalsPhoto by Firmbee.com on UnsplashThere are many beginners in the field of data science since when the requirement of data scientists boosted in this pandemic. Most of the time, they have questions like where can I find datasets for machine learning/ deep learning projects? Where can I get free datasets for data science?So here I am writing a piece of useful information for every beginner from the very basic. I hope this article will be helpful to beginners as well as advanced data science professionals who were not familiar with these resources earlier.Data! Data! Data!Isn’t it everywhere but is it in ready-to-use form? Absolutely No. Before I take you guys through the list of resources that provide datasets almost around every type of field and of course for free. The very first thing that one should understand is that to apply data science skills, you first need to have a dataset in hand that too in a ready-to-use form.How to find the best datasets for specific machine-learning projects? Where to look at? Got Stuck..!! Spent a lot of time in search of it and ended up frustrated, then this short crisp guide is only for you to walk through some useful resources.If you ever worked on your own project from the scratch, you might be familiar with the obstacles you might have encountered during data collection. Data collection is the very first and most important step of all to get started in a Data-science project.There are three techniques to get a dataset :So, Below I am providing you with the top 15 useful Platforms from where you can get Datasets to get started with your journey in Data-Science.A complete list of dataset resources:-Kaggle → I am damn sure, many of you must be familiar with this one platform as it is very famous among data science people for lots of reasons. I put it here on top as well because I do use it most of the time. It is indeed helpful for the data science community as they have interesting datasets and cover almost every different aspect like health, finance, banking, education, and what not! If you ever wanted already prepared data in a structured form, make it your first choice to go to. The Kaggle datasets not only are open, accessible data formats better supported on the platform, they are also easier to work with for more people regardless of their tools. You can find a variety of file types as well, like CSVs, JSON, SQLite, BigQuery2. UCI ML repo → The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. They currently have 588 open source datasets for data science as a service to the machine learning community and have helpful datasets for machine learning projects. You can view all the UCI datasets through their searchable interface. While doing my master’s, I was unable to find any free datasets for one of my projects. Then, one of my faculty suggested I use this platform, and it did great. I found a really good and big UCI dataset to start my project. I put it in second place in the list as the UCI datasets are pretty well organized. You can even search using a type of task you are attempting like regression or classification or NLP.3. Quandl → A resource provides free datasets for the data science community which is designed for professionals and delivers financial, economic, and alternative data to people worldwide. They cover almost two types of data, time series, and tables. One can find interesting datasets for finance, economics here. In these areas, you can find pretty good datasets for machine learning projects. All you need to do is insert keywords of your interest in a search bar and choose from the results of listed datasets.To use the particular dataset, you have to visit its usage tab. Since I cannot cover everything here, please visit this documentation to fully understand how can you use this platform for completing your data requirements using APIs. Other than APIs, you can also call any financial data of your need using python library directly into your python IDE, to understand how to do that, please refer to this documentation.4. Data.gov → This particular site is maintained by the U.S. government and only they decide what to put out there as public, free datasets to be used by researchers and data science people like me and you. Here you can find free datasets arranged by different categories like agriculture, climate, energy, ocean, local government, maritime, older adults health. So if any of the listed categories interest you in any way, you can visit this platform to get the free datasets for machine learning projects. Just visit the site URL and go to the Data tab on top. This will list all the datasets. You can search as well around interested keywords. While exploring this site, I have found very good resources as well. So don’t forget to go through them once.5. Data.gov.in → This particular site is maintained by the Indian government. They release all sorts of data in almost every domain like education, finance, healthcare, and many more so that researchers like us can use it and develop some useful projects on it. You can use these free datasets for data science projects of course. You can also find image datasets for machine learning/deep learning projects here. The interesting datasets shared on this website have projects built for DRDO, ISRO like organizations. So the sensitivity is high of such free datasets in terms of correct usage.Accessing this particular is very simple:Just enter the keywords and search e.g., EducationClick on the relevant search result.The above step will take you to the catalog containing that dataset.Go through pages in the catalog, to find the right dataset.Extract the dataset in the required format.The site will ask some basic questions for using purpose, answer them and save the data.6. World bank data → The only website which works closely with bank regions and global practices to get high-level good statistical data and they maintains several macros, financial, and sector databases. They don’t compromise on the quality and quantity of data in any aspect as the goal of a world without poverty is essential to them. So if this option excites you as well, please refer to this sufficient documentation on how to download the dataset from this particular site. This documentation will help you in better understanding the site as the datasets are distributed across different sections like DataBank, Microdata, Data Catalog.7. grouplens → So basically grouplens is a research lab in the department of computer science situated at the University of Minnesota. They have developed some end-to-end data science projects like movielens, local geographic information systems, digital libraries, cyclopath, booklens. They also give access to some of the free datasets that they have acquired from research and surveys. If any of the listed projects interest you, please visit their datasets tab on the top of the site to see what is available for use and what is not.8. RBI → Reserve Bank of India has put some free data out there. If you also want to analyze money market operations, payment flows, use of banking then this site is a must to go for finding the right dataset for your next data science project. The datasets are organized by the way of collection, whether daily, weekly, monthly, etc. This site would be really helpful to perform some time series projects.9. Github repo for public datasets → An awesome public dataset repo is the repository that I found useful on Github. This repository is wonderful containing some high-quality free datasets and not to mention very well organized for different domains. Do visit this repo and don’t forget to share with me, whether you find it useful or not?10. Fivethirtyeight → is a site that writes interactive articles and makes graphics on topics ranging from Politics to Sports, Economics, Culture, and Science & health. They provide some analytical stories retrieved from a variety of open-source datasets. You can have access to the free datasets from this link, all you have to do is download the data of your interest.Another available option is their GitHub repo for accessing their interesting datasets as well as code behind creating visualizations and interactive stories.11. Data.world → This site is very handy not just for data science people but also for non-technical people who just want to get insights. People like journalists, business people to get clear, accurate, fast answers to any business question. They have a pretty good organization for free datasets in the data catalog to discover, govern, and for easy access. This is the link to the Finance directory of the open datasets present on data.world. You may choose some other category of your choice.12. Google Dataset Search → Unlike other sites, It is a search engine which is invented for finding free datasets. Google Dataset Search works just like regular google search does base on keywords provided. It matches the keywords with the description of the dataset instead of the content. On entering the specific keywords, there are pretty good chances of finding a dataset if it is publicly available. At the time of the launch, Google Dataset Search had almost 25 million different free datasets from across the globe.13. Open ML → OpenML is an open data science platform that is actually meant for machine learning research. The platform is pretty neat and clean, have all sections organized. You can find free datasets from a variety of domains like healthcare, education, climate change, politics, sports, and whatnot. Every interesting dataset can be downloaded in multiple formats like CSV, JSON, XML, etc and there is a separate page for each dataset on this site.Just visit the dataset of your choice and download it in the required formats by choosing on the top right corner.You can use this platform to perform your machine learning tasks as well and also you can take help from other’s tasks on the site and those models built by you can be shared with others so that they can also use it.14. BuzzFeed News → It is an American news website that features analytical stories. It has open-sourced its datasets, libraries, and tools they use, data and analyses, some guides for your easiness on their Github repo.15. National Center for Environmental Information → This one is the best bet if you are looking for some data related to weather and environmental conditions. NCEI is the largest repository of environmental data in the world. It has datasets related to the climatic & weather conditions across the United States, oceanic data, meteorological data, climatic conditions, geophysical data, atmospheric information, etc.ConclusionThe above-consolidated list of free dataset resources is the list starts with very well-known resources to some underrated resources, the resources which are not very popular in the data science community.I tried my best to include the best as possible because I know the struggle of not getting the right data. Also, it is very time-consuming and tedious work if you go for scraping every time. Almost all the listed data aggregator sites host open datasets.If you are planning to work on any data science project, I hope this list will help you with your first step of getting the right data. If it helps you in any way, don’t forget to comment and let me and others know. If you like the list and think it will gonna help other people as well, please applaud the article so that it can reach out to the needy ones.Cheers and best wishes to all of you !!!

Read More

[Book Review] The Future of Violence: Robots and Germs, Hackers and Drones- Confronting the New Age of Threat

Being futuristic is often stressful and the book The Future of Violence makes it easier to see why is that so. The book raises fascinating, yet thoughtful questions about the growth and easy accessibility of emerging technologies that possesses immense threat and could eventually change the preconceived notion of violence. Written by Benjamin Wittes, a senior fellow at the Brookings Institution, and Gabriella Blum, a professor at Harvard Law School, the book raises fundamental questions about the proliferation of modern biotechnology, networked computers, robotics and so forth, whom the authors collectively term as ‘technologies of mass empowerment’.

Read More

[Paper Summary] DeepMind introduces it’s Supermodel AI ‘Perceiver’: a Neural Network Model that could process all types of input

DeepMind recently released a state-of-the-art deep learning model called Perceiver via a recent paper. It adapts the Transformer to let it consume all the types of input ranging from audio to images and perform different tasks, such as image recognition, for which particular kinds of neural networks are generally developed. It works very similarly to how the human brain perceives multi-modal input.

Read More

Top books to read on Data Science this month

With data science making it into the list of the most promising careers, a lot of people have started to show interest in the same. In order to stay ahead in the race, having sound knowledge about data science is the key. No wonder, there are numerous ways in which one can gain knowledge. Here are some of the top books to read on data science that would help you stand apart from the rest

Read More

[Paper Summary] IBM Researchers propose a Quantum Kernel Algorithm

Many quantum machine learning algorithms have been believed to provide exponential speed-ups over classical machine learning (ML) approaches, based on the assumption that classical data can be provided to the algorithm in the form of quantum states. Yet, no studies show whether a method exists that can efficiently provide data in this manner.

Read More

[Podcast] Why AI & ML Engineers should incorporate Value Sensitive Design into their Models

Today, many people realize that the shifting paradigms of AI, automation, and digital transformation will disrupt numerous human-involved processes, but few ponder how those disruptions will affect ethics and equity and principles of justice. Fewer still contemplate how to address these technoethical challenges, and what framework should be applied in doing so. Enter Steven Umbrello, Managing Director at the Institute for Ethics and Emerging Technologies.

Read More

[Paper Summary] Researchers from Facebook AI Research and UIUC Propose ‘MaskFormer’, A Mask Classification Model

In recent years, semantic segmentation has become an important tool for computer vision. One type of the technique is called per-pixel classification and the goal is to partition images into regions with different categories using deep learning techniques such as Fully Convolutional Networks (FCNs). Mask classification is another alternative way that separates the image partitioning and classifying aspects of segmentation. Instead a single pixel, mask-based methods predict binary masks with each associated to those assigned to one specific class.

Read More

[Paper Summary] Stanford’s AI Researchers introduce QA-GNN Model that jointly reasons with Language Models and Knowledge Graphs

painter is Italy.

In this research paper, published at NAACL 2021, researchers found that combining both LMs and KGs makes it possible to answer questions more effectively. Existing systems that use LM and KGs tend to be noisy, and the interactions between QA context and KG are not modeled.

Read More

[Guide] Google Open-Sources a First-Of-Its-Kind, General-Purpose Transpiler for fully Homomorphic Encryption (FHE)

Google has formulated a guide of coding utilities that allow encrypted data to be fully homomorphic encrypted (FHE). The open-source set of libraries and tools enables computational operations to be performed on encrypted data without first decrypting it, resulting in increased security and privacy. Secure multi-party computing and homomorphic encryption are well-known technologies. Rather than rewriting the foundation for the technologies, FHE focuses on improving and making them appropriate for broader deployment.

Read More

[Book Review] Why neural networks aren’t fit for natural language understanding

Linguistics for the Age of AI, a book by two scientists at Rensselaer Polytechnic Institute, discusses the shortcomings of current approaches to natural language understanding (NLU) and explores future pathways for developing intelligent agents that can interact with humans without causing frustration or making dumb mistakes.

Read More

[Paper Summary] A new study from Cambridge, Twitter, UCLA propose CW Networks (CWNs) with better Expressive Power than GNNs

A recent study from a multi-institutional research team introduces CW Networks (CWNs), a message-passing mechanism that produces state-of-the-art outcomes across a variety of molecular datasets while delivering superior expressivity than commonly utilized graph neural networks (GNNs).

Read More

[Video Highlights] Running effective Machine Learning Teams

Niko Laskaris, Data Scientist and Head of Strategic Projects at MLOps solutions provider Comet, hosts a compelling webinar showing you how to: a) create value for your business with AI, b) introduce ML to your business and scope your initiatives appropriately, and
c) manage the ML lifecycle with visibility and ability to reproduce models.

Read More

[Paper Summary] Cornell and Harvard University Researchers develops Correlation Convolutional Neural Networks (CCNN): To determine which Correlations are most important

team of researchers from Cornell and Harvard University introduces a novel approach to parse quantum matter and make crucial data distinctions. This proposed technique will enable researchers to decipher the most perplexing phenomena in the subatomic realm.

Read More
1 2 3 12