AI Terminology

Don't worry, everyone is confused initially.


A set of rules that a machine can follow to learn how to do a task.

Artificial intelligence

This refers to the general concept of machines acting in a way that simulates or mimics human intelligence. AI can have a variety of features, such as human-like communication or decision making.


A machine is described as autonomous if it can perform its task or tasks without needing human intervention.

Big data

Datasets that are too large or complex to be used by traditional data processing applications.


A chatbot is program that is designed to communicate with people through text or voice commands in a way that mimics human-to-human conversation.

Data mining

The process of analyzing datasets in order to discover new patterns that might improve the model.

Data science

Drawing from statistics, computer science and information science, this interdisciplinary field aims to use a variety of scientific methods, processes and systems to solve problems involving data.


A collection of related data points, usually with a uniform order and tags.

Deep learning

A function of artificial intelligence that imitates the human brain by learning from the way data is structured, rather than from an algorithm that’s programmed to do one specific thing.

General AI

AI that could successfully do any intellectual task that can be done by any human being. This is sometimes referred to as strong AI, although they aren’t entirely equivalent terms.


A part of training data that identifies the desired output for that particular piece of data.

Machine learning

This subset of AI is particularly focused on developing algorithms that will help machines to learn and change in response to new data, without the help of a human being.


A broad term referring to the product of AI training, created by running a machine learning algorithm on training data.

Neural network

Also called a neural net, a neural network is a computer system designed to function like the human brain. Although researchers are still working on creating a machine model of the human brain, existing neural networks can perform many tasks involving speech, vision and board game strategy.

Natural language generation (NLG)

This refers to the process by which a machine turns structured data into text or speech that humans can understand. Essentially, NLG is concerned with what a machine writes or says as the end part of the communication process.

Natural language processing (NLP)

The umbrella term for any machine’s ability to perform conversational tasks, such as recognizing what is said to it, understanding the intended meaning and responding intelligibly.


An important AI term, overfitting is a symptom of machine learning training in which an algorithm is only able to work on or identify specific examples present in the training data. A working model should be able to use the general trends behind the data to work on new examples.

Pattern recognition

The distinction between pattern recognition and machine learning is often blurry, but this field is basically concerned with finding trends and patterns in data.

Predictive analytics

By combining data mining and machine learning, this type of analytics is built to forecast what will happen within a given timeframe based on historical data and trends.


A popular programming language used for general programming.

Reinforcement learning

A method of teaching AI that sets a goal without specific metrics, encouraging the model to test different scenarios rather than find a single answer. Based on human feedback, the model can then manipulate the next scenario to get better results.

Supervised learning

This is a type of machine learning where structured datasets, with inputs and labels, are used to train and develop an algorithm.

Test data

The unlabeled data used to check that a machine learning model is able to perform its assigned task.

Training data

This refers to all of the data used during the process of training a machine learning algorithm, as well as the specific dataset used for training rather than testing.

Unsupervised learning

This is a form of training where the algorithm is asked to make inferences from datasets that don’t contain labels. These inferences are what help it to learn.

Validation data

Structured like training data with an input and labels, this data is used to test a recently trained model against new data and to analyze performance, with a particular focus on checking for overfitting.