In machine learning, features and labels are two essential components of datasets that enable models to learn patterns and make predictions.
Features are the input variables or attributes that describe each data point in your dataset. Think of features as the characteristics or properties you use …In machine learning, features and labels are two essential components of datasets that enable models to learn patterns and make predictions.
Features are the input variables or attributes that describe each data point in your dataset. Think of features as the characteristics or properties you use to make a prediction. For example, if you're building a model to predict house prices, features might include the number of bedrooms, square footage, location, age of the property, and number of bathrooms. Features are sometimes called predictors, independent variables, or input variables. In Azure Machine Learning, features form the columns of your training data that the algorithm analyzes to identify patterns.
Labels are the output variables or target values that you want your model to predict. The label represents the answer or outcome you're trying to determine. In the house price example, the label would be the actual sale price of the house. Labels are also known as target variables, dependent variables, or output variables. During training, the model learns the relationship between features and labels so it can later predict labels for new, unseen data.
In supervised learning scenarios on Azure, your training dataset must contain both features and labels. The model examines how features correlate with labels to build its predictive capability. For instance, Azure Machine Learning Designer allows you to specify which columns serve as features and which column is your label when configuring training modules.
In unsupervised learning, datasets typically contain only features since the goal is to discover hidden patterns or groupings rather than predict specific outcomes.
Understanding the distinction between features and labels is crucial for preparing data correctly in Azure Machine Learning services. Properly selecting relevant features and accurately labeling your data significantly impacts model performance and prediction accuracy.
Features and Labels in Machine Learning Datasets
Why It Is Important
Understanding features and labels is fundamental to machine learning because they form the core components of any supervised learning model. Without grasping these concepts, you cannot effectively prepare data, train models, or interpret results. This knowledge is essential for the AI-900 exam as it tests your foundational understanding of how machine learning algorithms learn from data.
What Are Features and Labels?
Features are the input variables or attributes used to make predictions. They are the characteristics or properties of the data that the model uses to learn patterns. For example, in a house price prediction model, features might include square footage, number of bedrooms, location, and age of the house.
Labels are the output variables or target values that the model is trying to predict. In supervised learning, labels are the known answers in your training data. Using the house price example, the label would be the actual sale price of each house.
How It Works
1. Data Collection: Gather a dataset containing multiple features and corresponding labels
2. Training Phase: The algorithm analyzes the relationships between features and labels to identify patterns
3. Model Creation: Based on learned patterns, the model develops rules to map features to labels
4. Prediction: When new data with only features is provided, the model predicts the appropriate label
Examples by ML Type:
Classification: Features = email text content; Label = spam or not spam Regression: Features = temperature, humidity, wind speed; Label = energy consumption value
Exam Tips: Answering Questions on Features and Labels
• Remember: Features are inputs, labels are outputs
• Key distinction: Labels are only present in supervised learning scenarios. Unsupervised learning works with features alone
• Common trick questions: The exam may describe a scenario and ask you to identify which columns are features versus labels. Always identify what is being predicted - that is the label
• Multiple features, one label: Most models use multiple features to predict a single label
• Terminology awareness: Labels may also be called target variables, dependent variables, or outcomes. Features may be called independent variables, predictors, or attributes
• Scenario-based questions: When given a dataset description, identify the column that represents what you want to predict - this is always the label, while remaining columns used for prediction are features
• Training vs inference: During training, both features and labels are required. During inference or prediction, only features are provided