# 150+ Python/Pyspark Pandas DataFrame Practice Exercises with Solutions Beginner to Expert Level

As a data analyst, working with tabular data is a fundamental part of your role. Pandas, a popular data manipulation library in Python, offers a powerful tool called the DataFrame to handle and analyze structured data. In this comprehensive guide, we will cover a wide range of exercises that will help you master DataFrame operations using Pandas, including some examples in PySpark.

## 1. Creating a Simple DataFrame

Let's start by creating a simple DataFrame from scratch. We'll use Pandas to create a DataFrame from a dictionary of data.

``````
import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 28],
'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)
print(df)
``````

Output:

``````
Name  Age           City
0    Alice   25       New York
1      Bob   30  San Francisco
2  Charlie   22    Los Angeles
3    David   28        Chicago
``````

### Explanation:

In the example above, we created a DataFrame with three columns: 'Name', 'Age', and 'City'. Each column contains data for different individuals.

## 2. Viewing Data

You can use various methods to view and inspect your DataFrame.

``````
# Display the first few rows of the DataFrame

# Display basic statistics about the DataFrame
print(df.describe())

# Display column names
print(df.columns)
``````

Output:

``````
Name  Age           City
0    Alice   25       New York
1      Bob   30  San Francisco
2  Charlie   22    Los Angeles
3    David   28        Chicago

Age
count   4.000000
mean   26.250000
std     3.304038
min    22.000000
25%    24.750000
50%    26.500000
75%    28.000000
max    30.000000

Index(['Name', 'Age', 'City'], dtype='object')
``````

### Explanation:

You can use the `head()` method to view the first few rows of the DataFrame, the `describe()` method to display basic statistics, and the `columns` attribute to display column names.

## 3. Selecting Columns

You can select specific columns from a DataFrame using the column names.

``````
# Select the 'Name' and 'Age' columns
selected_columns = df[['Name', 'Age']]
print(selected_columns)
``````

Output:

``````
Name  Age
0    Alice   25
1      Bob   30
2  Charlie   22
3    David   28
``````

### Explanation:

In the example above, we selected only the 'Name' and 'Age' columns from the DataFrame.

## 4. Filtering Data

You can filter rows based on conditions.

``````
# Filter individuals above the age of 25
filtered_data = df[df['Age'] > 25]
print(filtered_data)
``````

Output:

``````
Name  Age           City
1    Bob   30  San Francisco
3  David   28        Chicago
``````

### Explanation:

The example above filters the DataFrame to include only individuals above the age of 25.

## 5. Sorting Data

You can sort the DataFrame based on a specific column.

``````
# Sort the DataFrame by 'Age' in ascending order
sorted_data = df.sort_values(by='Age')
print(sorted_data)
``````

Output:

``````
Name  Age           City
2  Charlie   22       New York
0    Alice   25  San Francisco
3    David   28        Chicago
1      Bob   30  San Francisco
``````

### Explanation:

The example above sorts the DataFrame based on the 'Age' column in ascending order.

## 6. Aggregating Data

You can perform aggregation functions like sum, mean, max, and min on DataFrame columns.

``````
# Calculate the mean age
mean_age = df['Age'].mean()
print("Mean Age:", mean_age)

# Calculate the maximum age
max_age = df['Age'].max()
print("Max Age:", max_age)
``````

Output:

``````
Mean Age: 26.25
Max Age: 30
``````

### Explanation:

In the example above, we calculated the mean and maximum age from the 'Age' column.

## 7. Data Transformation: Adding a New Column

You can add a new column to the DataFrame.

``````
# Add a new column 'Salary' with random salary values
import random
df['Salary'] = [random.randint(40000, 90000) for _ in range(len(df))]
print(df)
``````

Output:

``````
Name  Age           City  Salary
0      John   25        New York   78500
1       Bob   30  San Francisco   62000
2      Mary   22        New York   42000
3     David   28        Chicago   74600
``````

### Explanation:

In this example, a new column 'Salary' is added to the DataFrame, and random salary values between 40000 and 90000 are assigned to each row.

## 8. Data Transformation: Removing a Column

You can remove a column from the DataFrame using the `drop` method.

``````
# Remove the 'City' column
df_without_city = df.drop('City', axis=1)
print(df_without_city)
``````

Output:

``````
Name  Age  Salary
0      John   25   78500
1       Bob   30   62000
2      Mary   22   42000
3     David   28   74600
``````

### Explanation:

The 'City' column is removed from the DataFrame using the `drop` method with `axis=1`.

## 9. Filtering Data: Select Rows Based on Condition

You can filter the DataFrame to select rows that meet certain conditions.

``````
# Select rows where Age is greater than 25
filtered_df = df[df['Age'] > 25]
print(filtered_df)
``````

Output:

``````
Name  Age       City  Salary
0   John   25    New York   78500
1    Bob   30  San Francisco   62000
3  David   28    Chicago   74600
``````

### Explanation:

The `filtered_df` contains rows where the 'Age' column value is greater than 25.

## 10. Aggregation: Calculating Mean Salary

You can calculate the mean salary of the employees.

``````
# Calculate the mean salary
mean_salary = df['Salary'].mean()
print("Mean Salary:", mean_salary)
``````

Output:

``````
Mean Salary: 64275.0
``````

### Explanation:

The mean salary of the employees is calculated using the `mean` function on the 'Salary' column.

## 11. Grouping and Aggregation: Calculate Maximum Age per City

You can group the data by a specific column and calculate aggregation functions within each group.

``````
# Group by 'City' and calculate maximum age
max_age_per_city = df.groupby('City')['Age'].max()
print(max_age_per_city)
``````

Output:

``````
City
Chicago          28
New York         25
San Francisco    30
Name: Age, dtype: int64
``````

### Explanation:

The `groupby` function is used to group the data by the 'City' column, and then the `max` function is applied to calculate the maximum age within each group.

## 12. Joining DataFrames: Merge Employee and Department Data

You can merge two DataFrames based on a common column.

``````
# Sample Department DataFrame
department_data = {'City': ['New York', 'San Francisco'],
'Department': ['HR', 'Finance']}
department_df = pd.DataFrame(department_data)

# Merge Employee and Department DataFrames
merged_df = pd.merge(df, department_df, on='City')
print(merged_df)
``````

Output:

``````
Name  Age           City  Salary Department
0    John   25       New York   78500        HR
1  Robert   22       New York   65000        HR
2     Bob   30  San Francisco   62000   Finance
3    Mary   24  San Francisco   73000   Finance
``````

### Explanation:

The `merge` function is used to combine the Employee DataFrame with the Department DataFrame based on the common 'City' column.

## 13. Filtering Data: Select Employees with Salary Greater Than 70000

You can filter rows based on certain conditions.

``````
# Select employees with salary greater than 70000
high_salary_employees = df[df['Salary'] > 70000]
print(high_salary_employees)
``````

Output:

``````
Name  Age           City  Salary
0  John   25       New York   78500
3  Mary   24  San Francisco   73000
``````

### Explanation:

The DataFrame is filtered using a condition `df['Salary'] > 70000` to select only those employees whose salary is greater than 70000.

## 14. Sorting Data: Sort Employees by Age in Descending Order

You can sort the DataFrame based on one or more columns.

``````
# Sort employees by age in descending order
sorted_by_age_desc = df.sort_values(by='Age', ascending=False)
print(sorted_by_age_desc)
``````

Output:

``````
Name  Age           City  Salary
2     Bob   30  San Francisco   62000
0    John   25       New York   78500
3    Mary   24  San Francisco   73000
1  Robert   22       New York   65000
``````

### Explanation:

The `sort_values` function is used to sort the DataFrame based on the 'Age' column in descending order.

## 15. Grouping and Aggregating Data: Calculate Average Salary by City

You can group data based on a column and then perform aggregation functions.

``````
# Group by city and calculate average salary
avg_salary_by_city = df.groupby('City')['Salary'].mean()
print(avg_salary_by_city)
``````

Output:

``````
City
New York         71750.0
San Francisco    67500.0
Name: Salary, dtype: float64
``````

### Explanation:

The `groupby` function is used to group the data by the 'City' column, and then the `mean` function is applied to the 'Salary' column to calculate the average salary for each city.

## 16. Merging DataFrames: Merge Employee and Department DataFrames

You can merge two DataFrames based on a common column.

``````
# Create a Department DataFrame
department_data = {'DepartmentID': [1, 2, 3],
'DepartmentName': ['HR', 'Finance', 'IT']}
departments = pd.DataFrame(department_data)

# Merge Employee and Department DataFrames
merged_df = pd.merge(df, departments, left_on='DepartmentID', right_on='DepartmentID')
print(merged_df)
``````

Output:

``````
Name  Age           City  Salary  DepartmentID DepartmentName
0    John   25       New York   78500            1             HR
1  Robert   22       New York   65000            1             HR
2    Mary   24  San Francisco   73000            2        Finance
3     Bob   30  San Francisco   62000            3             IT
``````

### Explanation:

The `merge` function is used to combine the Employee DataFrame and the Department DataFrame based on the 'DepartmentID' column.

## 17. Sorting Data: Sort Employees by Salary in Descending Order

You can sort the DataFrame based on one or more columns.

``````
# Sort employees by salary in descending order
sorted_df = df.sort_values(by='Salary', ascending=False)
print(sorted_df)
``````

Output:

``````
Name  Age           City  Salary  DepartmentID
0    John   25       New York   78500            1
2    Mary   24  San Francisco   73000            2
1  Robert   22       New York   65000            1
3     Bob   30  San Francisco   62000            3
``````

### Explanation:

The `sort_values` function is used to sort the DataFrame by the 'Salary' column in descending order.

## 18. Dropping Columns: Remove the DepartmentID Column

You can drop unnecessary columns from the DataFrame.

``````
# Drop the DepartmentID column
df_without_dept = df.drop(columns='DepartmentID')
print(df_without_dept)
``````

Output:

``````
Name  Age           City  Salary
0    John   25       New York   78500
1  Robert   22       New York   65000
2    Mary   24  San Francisco   73000
3     Bob   30  San Francisco   62000
``````

### Explanation:

The `drop` function is used to remove the 'DepartmentID' column from the DataFrame.

## 19. Filtering Data: Get Employees with Salary Above 70000

You can filter rows based on a condition.

``````
# Filter employees with salary above 70000
high_salary_df = df[df['Salary'] > 70000]
print(high_salary_df)
``````

Output:

``````
Name  Age           City  Salary  DepartmentID
0  John   25       New York   78500            1
2  Mary   24  San Francisco   73000            2
``````

### Explanation:

We use boolean indexing to filter rows where the 'Salary' column is greater than 70000.

## 20. Grouping Data: Calculate Average Salary by City

You can group data based on one or more columns and perform aggregate functions.

``````
# Group by city and calculate average salary
average_salary_by_city = df.groupby('City')['Salary'].mean()
print(average_salary_by_city)
``````

Output:

``````
City
New York         71750.0
San Francisco    67500.0
Name: Salary, dtype: float64
``````

### Explanation:

We use the `groupby` function to group the data by the 'City' column and then calculate the mean of the 'Salary' column for each group.

## 21. Renaming Columns: Rename DepartmentID to DeptID

You can rename columns in a DataFrame using the `rename` method.

``````
# Rename DepartmentID column to DeptID
df.rename(columns={'DepartmentID': 'DeptID'}, inplace=True)
print(df)
``````

Output:

``````
Name  Age           City  Salary  DeptID
0   John   25       New York   78500       1
1  Alice   28  San Francisco   62000       2
2   Mary   24  San Francisco   73000       2
``````

### Explanation:

We use the `rename` method and provide a dictionary to specify the old column name as the key and the new column name as the value. The `inplace=True` argument makes the changes in-place.

## 22. Merging DataFrames: Merge Employee and Department Data

You can merge two DataFrames using the `merge` function.

``````
# Create department DataFrame
department_data = {'DeptID': [1, 2], 'DepartmentName': ['HR', 'Finance']}
department_df = pd.DataFrame(department_data)

# Merge employee and department DataFrames
merged_df = pd.merge(df, department_df, on='DeptID')
print(merged_df)
``````

Output:

``````
Name  Age           City  Salary  DeptID DepartmentName
0   John   25       New York   78500       1             HR
1  Alice   28  San Francisco   62000       2        Finance
2   Mary   24  San Francisco   73000       2        Finance
``````

### Explanation:

We create a new DataFrame `department_df` to represent the department information. Then, we use the `merge` function to merge the `df` DataFrame with the `department_df` DataFrame based on the 'DeptID' column.

## 23. Grouping and Aggregation: Calculate Average Salary by Department

You can use the `groupby` method to group the DataFrame by a specific column and then apply aggregation functions.

``````
# Group by DepartmentName and calculate average salary
average_salary_by_department = merged_df.groupby('DepartmentName')['Salary'].mean()
print(average_salary_by_department)
``````

Output:

``````
DepartmentName
Finance    67500.0
HR         78500.0
Name: Salary, dtype: float64
``````

### Explanation:

We use the `groupby` method to group the merged DataFrame by the 'DepartmentName' column. Then, we calculate the average salary for each department using the `mean` function on the 'Salary' column within each group.

## 24. Pivot Table: Create a Pivot Table of Average Salary by Department and Age

Pivot tables allow you to create multi-dimensional summaries of data.

``````
# Create a pivot table of average salary by DepartmentName and Age
pivot_table = merged_df.pivot_table(values='Salary', index='DepartmentName', columns='Age', aggfunc='mean')
print(pivot_table)
``````

Output:

``````
Age                   24      25      28
DepartmentName
Finance          73000.0     NaN  62000.0
HR                   NaN  78500.0      NaN
``````

### Explanation:

We use the `pivot_table` method to create a pivot table that displays the average salary for each combination of 'DepartmentName' and 'Age'. The `aggfunc='mean'` argument specifies that the aggregation function should be the mean.

## 25. Selecting Rows Based on Conditions

You can filter rows from a DataFrame based on certain conditions using boolean indexing.

``````
# Select rows where 'Age' is greater than 25
selected_rows = merged_df[merged_df['Age'] > 25]
print(selected_rows)
``````

Output:

``````
EmployeeID  Name  Age DepartmentName  Salary
1           2  Jane   28             HR   78500
``````

### Explanation:

We use boolean indexing to filter rows where the 'Age' column is greater than 25.

## 26. Sorting DataFrame by Columns

You can sort a DataFrame based on one or more columns using the `sort_values` function.

``````
# Sort DataFrame by 'Salary' column in descending order
sorted_df = merged_df.sort_values(by='Salary', ascending=False)
print(sorted_df)
``````

Output:

``````
EmployeeID   Name  Age DepartmentName  Salary
0           1   John   25        Finance   80000
1           2   Jane   28             HR   78500
2           3  Alice   24        Finance   72000
``````

### Explanation:

We use the `sort_values` function to sort the DataFrame based on the 'Salary' column in descending order.

## 29. Grouping Data

You can group data based on one or more columns using the `groupby` function.

``````
# Group data by 'DepartmentName' and calculate the average salary
grouped_data = merged_df.groupby('DepartmentName')['Salary'].mean()
print(grouped_data)
``````

Output:

``````
DepartmentName
Finance    76000.0
HR         78500.0
Name: Salary, dtype: float64
``````

### Explanation:

We use the `groupby` function to group the data by the 'DepartmentName' column and then calculate the average salary for each group.

## 30. Merging DataFrames

You can merge two DataFrames using the `merge` function.

``````
# Create a new DataFrame with department-wise average salary
department_avg_salary = merged_df.groupby('DepartmentName')['Salary'].mean().reset_index()

# Merge the original DataFrame with the department-wise average salary DataFrame
merged_with_avg_salary = pd.merge(merged_df, department_avg_salary, on='DepartmentName', suffixes=('', '_avg'))
print(merged_with_avg_salary)
``````

Output:

``````
EmployeeID   Name  Age DepartmentName  Salary  Salary_avg
0           1   John   25        Finance   80000     76000.0
1           3  Alice   24        Finance   72000     76000.0
2           2   Jane   28             HR   78500     78500.0
``````

### Explanation:

We first calculate the average salary for each department using the `groupby` function and create a new DataFrame. Then, we use the `merge` function to combine the original DataFrame with the department-wise average salary DataFrame based on the 'DepartmentName' column.

## 31. Pivoting Data

You can pivot data using the `pivot_table` function.

``````
# Create a pivot table to display average salary for each department and age
pivot_table = merged_df.pivot_table(index='DepartmentName', columns='Age', values='Salary', aggfunc='mean')
print(pivot_table)
``````

Output:

``````
Age                  24      25      28
DepartmentName
Finance          72000.0  80000.0     NaN
HR                   NaN     NaN  78500.0
``````

### Explanation:

We use the `pivot_table` function to create a pivot table that displays the average salary for each department and age combination.

## 35. Working with Missing Data

Missing data can be handled using various functions.

``````
# Check for missing values in the DataFrame
missing_values = merged_df.isnull().sum()
print(missing_values)
``````

Output:

``````
EmployeeID         0
Name               0
Age                0
DepartmentName     0
Salary             0
dtype: int64
``````

### Explanation:

The `isnull()` function checks for missing values in the DataFrame and returns a boolean DataFrame. The `sum()` function then calculates the total number of missing values for each column.

## 36. Handling Missing Data

Missing data can be filled using the `fillna` function.

``````
# Fill missing values in the 'Age' column with the mean age
merged_df['Age'].fillna(merged_df['Age'].mean(), inplace=True)
print(merged_df)
``````

Output:

``````
EmployeeID   Name  Age DepartmentName  Salary
0           1   John   25        Finance   80000
1           2   Jane   28             HR   78500
2           3  Alice   24        Finance   72000
``````

### Explanation:

The `fillna()` function is used to fill missing values in the 'Age' column with the mean age of the dataset. The `inplace=True` parameter applies the changes to the original DataFrame.

## 37. Exporting Data to CSV

DataFrames can be exported to CSV files using the `to_csv` function.

``````
# Export the DataFrame to a CSV file
merged_df.to_csv('employee_data.csv', index=False)
``````

Output:

A CSV file named 'employee_data.csv' will be created in the working directory.

### Explanation:

The `to_csv()` function is used to export the DataFrame to a CSV file. The `index=False` parameter prevents the index column from being exported.

## 38. Exporting Data to Excel

DataFrames can be exported to Excel files using the `to_excel` function.

``````
# Export the DataFrame to an Excel file
merged_df.to_excel('employee_data.xlsx', index=False)
``````

Output:

An Excel file named 'employee_data.xlsx' will be created in the working directory.

### Explanation:

The `to_excel()` function is used to export the DataFrame to an Excel file. The `index=False` parameter prevents the index column from being exported.

## 39. Merging DataFrames

DataFrames can be merged using the `merge` function.

``````
# Merge two DataFrames based on a common column
merged_data = pd.merge(employee_df, department_df, on='DepartmentID')
print(merged_data)
``````

Output:

``````
EmployeeID   Name  Age  DepartmentID  Salary DepartmentName
0           1   John   25             1   80000        Finance
1           3  Alice   24             1   72000        Finance
2           2   Jane   28             2   78500             HR
``````

### Explanation:

The `merge()` function is used to merge two DataFrames based on a common column, in this case, 'DepartmentID'. The resulting DataFrame contains columns from both original DataFrames.

## 40. Grouping and Aggregating Data

Data can be grouped and aggregated using the `groupby` function.

``````
# Group data by department and calculate average salary
average_salary = merged_data.groupby('DepartmentName')['Salary'].mean()
print(average_salary)
``````

Output:

``````
DepartmentName
Finance    76000.0
HR         78500.0
Name: Salary, dtype: float64
``````

### Explanation:

The `groupby()` function is used to group the data by the 'DepartmentName' column. The `mean()` function calculates the average salary for each department.

## 41. Pivot Tables

Pivot tables can be created using the `pivot_table` function.

``````
# Create a pivot table to display average salary by department and age
pivot_table = merged_data.pivot_table(values='Salary', index='DepartmentName', columns='Age', aggfunc='mean')
print(pivot_table)
``````

Output:

``````
Age                24      25      28
DepartmentName
Finance        72000.0  80000.0     NaN
HR                  NaN     NaN  78500.0
``````

### Explanation:

The `pivot_table()` function creates a pivot table that displays the average salary by department and age. The `values` parameter specifies the column to aggregate, the `index` parameter specifies the rows (DepartmentName), the `columns` parameter specifies the columns (Age), and the `aggfunc` parameter specifies the aggregation function to use.

## 45. Handling Missing Data

Missing data can be handled using functions like `fillna` and `dropna`.

``````
# Fill missing values with a specific value
df_filled = df.fillna(value=0)

# Drop rows with missing values
df_dropped = df.dropna()
``````

### Explanation:

The `fillna()` function is used to fill missing values in the DataFrame with a specified value, such as 0 in this case. The `dropna()` function is used to remove rows with missing values from the DataFrame.

## 46. Sorting Data

DataFrames can be sorted using the `sort_values` function.

``````
# Sort DataFrame by 'Salary' in ascending order
sorted_df = df.sort_values(by='Salary')
print(sorted_df)
``````

Output:

``````
EmployeeID   Name  Age  DepartmentID  Salary
1           3  Alice   24             1   72000
0           1   John   25             1   80000
2           2   Jane   28             2   78500
``````

### Explanation:

The `sort_values()` function is used to sort the DataFrame by a specified column, in this case, 'Salary'. The resulting DataFrame is sorted in ascending order by default.

## 47. Merging DataFrames with Different Columns

DataFrames with different columns can be merged using the `merge` function with the `how` parameter.

``````
# Merge two DataFrames with different columns using an outer join
merged_data = pd.merge(df1, df2, how='outer', left_on='ID', right_on='EmployeeID')
print(merged_data)
``````

Output:

``````
ID   Name  Age  EmployeeID   Department
0   1   John   25           1      Finance
1   2   Jane   28           2           HR
2   3  Alice   24           3      Finance
3   4    Bob   30         NaN          NaN
``````

### Explanation:

The `merge()` function can handle merging DataFrames with different columns using different types of joins. In this example, an outer

## 42. Applying Functions to Columns

You can apply custom functions to columns using the `apply` function.

``````
# Define a custom function
def double_salary(salary):
return salary * 2

# Apply the custom function to the 'Salary' column
df['Doubled Salary'] = df['Salary'].apply(double_salary)
print(df)
``````

Output:

``````
EmployeeID   Name  Age  DepartmentID  Salary  Doubled Salary
0           1   John   25             1   80000          160000
1           2   Jane   28             2   78500          157000
2           3  Alice   24             1   72000          144000
``````

### Explanation:

The `apply()` function is used to apply a custom function to each element of a column. In this example, a custom function `double_salary` is defined to double the salary of each employee, and the function is applied to the 'Salary' column using `df['Salary'].apply(double_salary)`. The result is a new column `'Doubled Salary'` containing the doubled salary values.

## 43. Creating Pivot Tables

Pivot tables can be created using the `pivot_table` function.

``````
# Create a pivot table with 'Department' as columns and 'Age' as values
pivot_table = df.pivot_table(values='Age', columns='Department', aggfunc='mean')
print(pivot_table)
``````

Output:

``````
Department   Finance    HR
Age               24    28
``````

### Explanation:

The `pivot_table()` function is used to create a pivot table from a DataFrame. In this example, the pivot table has 'Department' as columns and 'Age' as values, with the aggregation function `'mean'` to calculate the average age for each department.

## 44. Grouping and Aggregating Data

Data can be grouped and aggregated using the `groupby` function.

``````
# Group data by 'Department' and calculate the average age and salary
grouped_data = df.groupby('Department').agg({'Age': 'mean', 'Salary': 'mean'})
print(grouped_data)
``````

Output:

``````
Age   Salary
Department
Finance      24.5  76000.0
HR           28.0  78500.0
``````

### Explanation:

The `groupby()` function is used to group the data based on a specified column, in this case, 'Department'. The `agg()` function is then used to apply aggregation functions to the grouped data. In this example, the average age and salary for each department are calculated.

## 45. Merging DataFrames

DataFrames can be merged using the `merge` function.

``````
# Create two DataFrames
df1 = pd.DataFrame({'EmployeeID': [1, 2, 3],
'Name': ['John', 'Jane', 'Alice'],
'DepartmentID': [1, 2, 1]})
df2 = pd.DataFrame({'DepartmentID': [1, 2],
'DepartmentName': ['Finance', 'HR']})

# Merge the DataFrames based on 'DepartmentID'
merged_df = pd.merge(df1, df2, on='DepartmentID')
print(merged_df)
``````

Output:

``````
EmployeeID   Name  DepartmentID DepartmentName
0           1   John             1        Finance
1           3  Alice             1        Finance
2           2   Jane             2             HR
``````

### Explanation:

The `merge()` function is used to merge two DataFrames based on a common column, in this case, 'DepartmentID'. The result is a new DataFrame containing the combined data from both DataFrames.

## 46. Handling Missing Values

Missing values can be handled using functions like `dropna` and `fillna`.

``````
# Drop rows with any missing values
cleaned_df = df.dropna()
print(cleaned_df)

# Fill missing values with a specific value
filled_df = df.fillna(value=0)
print(filled_df)
``````

Output:

``````
EmployeeID   Name  Age  DepartmentID  Salary
0           1   John   25             1   80000
1           2   Jane   28             2   78500
2           3  Alice   24             1   72000

EmployeeID   Name  Age  DepartmentID  Salary
0           1   John   25             1   80000
1           2   Jane   28             2   78500
2           3  Alice   24             1   72000
``````

### Explanation:

The `dropna()` function is used to remove rows with any missing values, while the `fillna()` function is used to fill missing values with a specified value, in this case, 0.

## 47. Changing Column Data Types

Column data types can be changed using the `astype` function.

``````
# Change the data type of 'Salary' column to float
df['Salary'] = df['Salary'].astype(float)
print(df.dtypes)
``````

Output:

``````
EmployeeID        int64
Name             object
Age               int64
DepartmentID      int64
Salary          float64
dtype: object
``````

### Explanation:

The `astype()` function is used to change the data type of a column. In this example, the data type of the 'Salary' column is changed from integer to float.

## 48. Grouping and Aggregating Data

Data can be grouped and aggregated using the `groupby` function.

``````
# Group data by 'DepartmentID' and calculate the average salary
grouped_df = df.groupby('DepartmentID')['Salary'].mean()
print(grouped_df)
``````

Output:

``````
DepartmentID
1    76000.0
2    78500.0
Name: Salary, dtype: float64
``````

### Explanation:

The `groupby()` function is used to group data based on a specified column, in this case, 'DepartmentID'. The `mean()` function is then applied to calculate the average salary for each department.

## 49. Pivoting Data

Data can be pivoted using the `pivot_table` function.

``````
# Pivot the data to show average salary by department and age
pivot_df = df.pivot_table(values='Salary', index='DepartmentID', columns='Age', aggfunc='mean')
print(pivot_df)
``````

Output:

``````
Age                  24      25      28
DepartmentID
1              76000.0  80000.0     NaN
2                  NaN     NaN  78500.0
``````

### Explanation:

The `pivot_table()` function is used to create a pivot table that displays the average salary by department and age. The `values` parameter specifies the column to aggregate, the `index` parameter specifies the rows, the `columns` parameter specifies the columns, and the `aggfunc` parameter specifies the aggregation function to use.

## 50. Exporting Data to CSV

Data can be exported to a CSV file using the `to_csv` function.

``````
# Export DataFrame to a CSV file
df.to_csv('employee_data.csv', index=False)
``````

### Explanation:

The `to_csv()` function is used to export a DataFrame to a CSV file. The `index` parameter is set to `False` to exclude the index column from the exported CSV file.

## 51. Exporting Data to Excel

Data can be exported to an Excel file using the `to_excel` function.

``````
# Export DataFrame to an Excel file
df.to_excel('employee_data.xlsx', index=False)
``````

### Explanation:

The `to_excel()` function is used to export a DataFrame to an Excel file. The `index` parameter is set to `False` to exclude the index column from the exported Excel file.

## 52. Joining DataFrames

DataFrames can be joined using the `merge` function.

``````
# Join two DataFrames based on a common column
result_df = pd.merge(df1, df2, on='EmployeeID')
print(result_df)
``````

Output:

``````
EmployeeID   Name_x  Salary_x   Name_y  Salary_y
0           1     John     60000    Alice     55000
1           2     Mary     75000    Bob       60000
``````

### Explanation:

The `merge()` function is used to combine two DataFrames based on a common column, in this case, 'EmployeeID'. The result is a new DataFrame containing the combined data from both DataFrames.

## 53. Merging DataFrames

DataFrames can be merged using the `merge` function with different types of joins.

``````
# Merge DataFrames with different types of joins
inner_join_df = pd.merge(df1, df2, on='EmployeeID', how='inner')
left_join_df = pd.merge(df1, df2, on='EmployeeID', how='left')
right_join_df = pd.merge(df1, df2, on='EmployeeID', how='right')
outer_join_df = pd.merge(df1, df2, on='EmployeeID', how='outer')
``````

### Explanation:

The `merge()` function can perform different types of joins based on the `how` parameter. The available options are `'inner'` (intersection of keys), `'left'` (keys from the left DataFrame), `'right'` (keys from the right DataFrame), and `'outer'` (union of keys).

## 54. Handling Missing Data

Missing data can be handled using the `fillna` function.

``````
# Fill missing values with a specific value
df['Salary'].fillna(0, inplace=True)
``````

### Explanation:

The `fillna()` function is used to fill missing values in a specific column with a specified value. The `inplace` parameter is set to `True` to modify the DataFrame in place.

## 55. Grouping and Aggregating Data

Data can be grouped and aggregated using the `groupby` function.

``````
# Grouping and aggregating data
grouped_df = df.groupby('Department')['Salary'].mean()
print(grouped_df)
``````

Output:

``````
Department
HR         65000
IT         70000
Sales      60000
Name: Salary, dtype: int64
``````

### Explanation:

The `groupby()` function is used to group the data by a specific column (in this case, 'Department'). The `mean()` function is then applied to the 'Salary' column to calculate the average salary for each department.

## 56. Pivot Tables

Pivot tables can be created using the `pivot_table` function.

``````
# Creating a pivot table
pivot_table = df.pivot_table(index='Department', values='Salary', aggfunc='mean')
print(pivot_table)
``````

Output:

``````
Department
HR         65000
IT         70000
Sales      60000
Name: Salary, dtype: int64
``````

### Explanation:

The `pivot_table()` function is used to create a pivot table that summarizes and aggregates data based on specified columns. In this example, a pivot table is created with the 'Department' column as the index and the 'Salary' column values are aggregated using the `mean()` function.

## 57. Creating a Bar Plot

Bar plots can be created using the `plot` function.

``````
# Creating a bar plot
df['Salary'].plot(kind='bar')
plt.xlabel('Employee')
plt.ylabel('Salary')
plt.title('Employee Salaries')
plt.show()
``````

Output:

An interactive bar plot will be displayed.

### Explanation:

The `plot()` function can be used to create various types of plots, including bar plots. The `kind` parameter is set to `'bar'` to indicate that a bar plot should be created. Additional labels and a title are added to the plot using the `xlabel()`, `ylabel()`, and `title()` functions. Finally, the `show()` function is used to display the plot.

## 58. Creating a Histogram

Histograms can be created using the `hist` function.

``````
# Creating a histogram
df['Age'].hist(bins=10)
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Age Distribution')
plt.show()
``````

Output:

An interactive histogram will be displayed.

### Explanation:

The `hist()` function is used to create a histogram plot. The `bins` parameter determines the number of bins or intervals in the histogram. Additional labels and a title are added to the plot using the `xlabel()`, `ylabel()`, and `title()` functions. Finally, the `show()` function is used to display the plot.

## 59. Creating a Box Plot

Box plots can be created using the `boxplot` function.

``````
# Creating a box plot
df.boxplot(column='Salary', by='Department')
plt.xlabel('Department')
plt.ylabel('Salary')
plt.title('Salary Distribution by Department')
plt.suptitle('')
plt.show()
``````

Output:

An interactive box plot will be displayed.

### Explanation:

The `boxplot()` function is used to create a box plot that visualizes the distribution of a numerical variable ('Salary') based on different categories ('Department'). The `column` parameter specifies the column to plot, and the `by` parameter specifies the grouping variable. Labels and titles are added to the plot using the `xlabel()`, `ylabel()`, and `title()` functions. The `suptitle()` function is used to remove the default title that comes with the plot. Finally, the `show()` function is used to display the plot.

## 60. Creating a Scatter Plot

Scatter plots can be created using the `scatter` function.

``````
# Creating a scatter plot
df.plot.scatter(x='Age', y='Salary')
plt.xlabel('Age')
plt.ylabel('Salary')
plt.title('Age vs Salary')
plt.show()
``````

Output:

An interactive scatter plot will be displayed.

### Explanation:

The `plot.scatter()` function is used to create a scatter plot that visualizes the relationship between two numerical variables ('Age' and 'Salary'). The `x` parameter specifies the x-axis variable, and the `y` parameter specifies the y-axis variable. Labels and a title are added to the plot using the `xlabel()`, `ylabel()`, and `title()` functions. Finally, the `show()` function is used to display the plot.

## 61. Filtering Rows with Multiple Conditions

You can filter rows in a DataFrame based on multiple conditions using the `&` (AND) and `|` (OR) operators.

``````
# Filtering rows with multiple conditions
filtered_df = df[(df['Age'] >= 30) & (df['Salary'] >= 50000)]
``````

### Explanation:

The example demonstrates how to filter rows in a DataFrame based on multiple conditions. In this case, we are filtering for rows where the 'Age' is greater than or equal to 30 and the 'Salary' is greater than or equal to 50000. The `&` operator is used to perform an element-wise AND operation on the conditions.

## 62. Grouping Data and Calculating Aggregates

You can use the `groupby` function to group data by one or more columns and then apply aggregate functions to the grouped data.

``````
# Grouping data and calculating aggregates
grouped_df = df.groupby('Department')['Salary'].mean()
``````

### Explanation:

The `groupby()` function is used to group the data by a specific column ('Department' in this case). Then, the `['Salary'].mean()` expression calculates the mean salary for each department. The result is a new DataFrame with department names as the index and the corresponding mean salaries.

## 63. Merging DataFrames

You can merge two DataFrames based on a common column using the `merge` function.

``````
# Merging DataFrames
merged_df = pd.merge(df1, df2, on='EmployeeID')
``````

### Explanation:

The `merge()` function is used to merge two DataFrames ('df1' and 'df2') based on a common column ('EmployeeID' in this case). The result is a new DataFrame containing the combined data from both original DataFrames.

## 64. Handling Missing Data

Missing data can be handled using the `fillna` function or by dropping rows with missing values using the `dropna` function.

``````
# Handling missing data
df.fillna(value=0, inplace=True)
``````

### Explanation:

The `fillna()` function is used to fill missing values in the DataFrame with a specified value (in this case, 0). The `inplace=True` parameter updates the DataFrame in place with the filled values.

## 65. Pivoting DataFrames

You can pivot a DataFrame using the `pivot` function to reshape the data based on column values.

``````
# Pivoting a DataFrame
pivot_df = df.pivot(index='Date', columns='Product', values='Sales')
``````

### Explanation:

The `pivot()` function is used to reshape the DataFrame. In this example, the DataFrame is pivoted based on the 'Date' and 'Product' columns, and the 'Sales' column values are used as the values for the pivoted DataFrame.

## 66. Melting DataFrames

Melting a DataFrame can help convert it from a wide format to a long format using the `melt` function.

``````
# Melting a DataFrame
melted_df = pd.melt(df, id_vars=['Date'], value_vars=['Product_A', 'Product_B'])
``````

### Explanation:

The `melt()` function is used to transform the DataFrame from wide format to long format. The 'Date' column is kept as the identifier variable, and the 'Product_A' and 'Product_B' columns are melted into a single column called 'variable', and their corresponding values are in the 'value' column.

## 67. Reshaping DataFrames with Stack and Unstack

You can use the `stack` and `unstack` functions to reshape DataFrames by stacking and unstacking levels of the index or columns.

``````
# Stacking and unstacking DataFrames
stacked_df = df.stack()
unstacked_df = df.unstack()
``````

### Explanation:

The `stack()` function is used to stack the specified level(s) of columns to produce a Series with a MultiIndex. The `unstack()` function is used to unstack the specified level(s) of the index to produce a DataFrame with reshaped columns.

## 68. Creating Pivot Tables

Pivot tables can be created using the `pivot_table` function to summarize and analyze data.

``````
# Creating a pivot table
pivot_table_df = df.pivot_table(index='Department', values='Salary', aggfunc='mean')
``````

### Explanation:

The `pivot_table()` function is used to create a pivot table. In this example, the pivot table is based on the 'Department' column, and the 'Salary' column values are aggregated using the mean function.

## 69. Grouping Data in a DataFrame

You can group data in a DataFrame using the `groupby` function to perform aggregate operations on grouped data.

``````
# Grouping data and calculating mean
grouped_df = df.groupby('Category')['Price'].mean()
``````

### Explanation:

The `groupby()` function is used to group data based on a column ('Category' in this case). The `mean()` function is then applied to the 'Price' column within each group to calculate the average price for each category.

## 70. Merging DataFrames

DataFrames can be merged using the `merge` function to combine data from different sources based on common columns.

``````
# Merging DataFrames
merged_df = pd.merge(df1, df2, on='common_column')
``````

### Explanation:

The `merge()` function is used to combine data from two DataFrames based on a common column ('common_column' in this case). The resulting DataFrame contains columns from both original DataFrames, aligned based on the matching values in the common column.

## 71. Concatenating DataFrames

DataFrames can be concatenated using the `concat` function to combine them vertically or horizontally.

``````
# Concatenating DataFrames vertically
concatenated_df = pd.concat([df1, df2])

# Concatenating DataFrames horizontally
concatenated_df = pd.concat([df1, df2], axis=1)
``````

### Explanation:

The `concat()` function is used to concatenate DataFrames either vertically (default) or horizontally (if `axis=1` is specified). This is useful when you want to combine data from different sources into a single DataFrame.

## 72. Handling Missing Data

Missing data can be handled using functions like `dropna`, `fillna`, and `interpolate`.

``````
# Dropping rows with missing values
cleaned_df = df.dropna()

# Filling missing values with a specific value
filled_df = df.fillna(value)

# Interpolating missing values
interpolated_df = df.interpolate()
``````

### Explanation:

Missing data can be handled using various methods. The `dropna()` function removes rows with missing values, the `fillna()` function fills missing values with a specified value, and the `interpolate()` function fills missing values using interpolation methods.

## 73. Reshaping DataFrames

DataFrames can be reshaped using functions like `pivot`, `melt`, and `stack/unstack`.

``````
# Pivot table
pivot_table = df.pivot_table(index='Index', columns='Column', values='Value')

# Melt DataFrame
melted_df = pd.melt(df, id_vars=['ID'], value_vars=['Var1', 'Var2'])

# Stack and unstack
stacked_df = df.stack()
unstacked_df = df.unstack()
``````

### Explanation:

DataFrames can be reshaped to change the layout of the data. The `pivot_table()` function creates a pivot table based on the provided columns, `melt()` function is used to transform wide data into long format, and `stack()` and `unstack()` functions change between multi-level indexed and unindexed representations.

## 74. Aggregating Data in Groups

DataFrames can be grouped and aggregated using functions like `groupby` and `agg`.

``````
# Grouping and aggregating data
grouped = df.groupby('Category')['Value'].agg(['mean', 'sum'])
``````

### Explanation:

The `groupby()` function is used to group data based on a column ('Category' in this case), and the `agg()` function is then used to perform aggregate operations (e.g., mean, sum) on the grouped data.

## 75. Applying Functions to Columns

You can apply functions to DataFrame columns using `apply` or `applymap`.

``````
# Applying a function to a column
df['NewColumn'] = df['Column'].apply(function)

# Applying a function element-wise to all columns
transformed_df = df.applymap(function)
``````

### Explanation:

The `apply()` function can be used to apply a function to a specific column. The `applymap()` function is used to apply a function element-wise to all columns in the DataFrame.

## 76. Using Lambda Functions

Lambda functions can be used for concise operations within DataFrames.

``````
# Applying a lambda function
df['NewColumn'] = df['Column'].apply(lambda x: x * 2)
``````

### Explanation:

Lambda functions provide a concise way to define small operations directly within a function call. In this case, the lambda function is applied to each element of the 'Column' and the result is assigned to the 'NewColumn'.

## 77. Handling Missing Data

Dealing with missing data is a common task in data analysis. Pandas provides various functions to handle missing values.

``````
# Check for missing values
missing_values = df.isnull().sum()

# Drop rows with missing values
cleaned_df = df.dropna()

# Fill missing values with a specific value
df_filled = df.fillna(value)
``````

### Explanation:

The `isnull()` function is used to identify missing values in the DataFrame. The `dropna()` function is used to remove rows containing missing values, and the `fillna()` function is used to fill missing values with a specified value.

## 78. Removing Duplicates

Removing duplicate rows is essential to ensure data accuracy and consistency.

``````
# Removing duplicates based on all columns
deduplicated_df = df.drop_duplicates()

# Removing duplicates based on specific columns
deduplicated_specific_df = df.drop_duplicates(subset=['Column1', 'Column2'])
``````

### Explanation:

The `drop_duplicates()` function removes duplicate rows from the DataFrame. You can specify columns using the `subset` parameter to consider only certain columns for duplicate removal.

## 79. Sorting DataFrames

DataFrames can be sorted using the `sort_values` function.

``````
# Sorting by a single column
sorted_df = df.sort_values(by='Column')

# Sorting by multiple columns
sorted_multi_df = df.sort_values(by=['Column1', 'Column2'], ascending=[True, False])
``````

### Explanation:

The `sort_values()` function is used to sort the DataFrame based on one or more columns. The `by` parameter specifies the columns to sort by, and the `ascending` parameter determines whether the sorting is in ascending or descending order.

## 80. Exporting DataFrames

DataFrames can be exported to various file formats using functions like `to_csv`, `to_excel`, and `to_sql`.

``````
# Export to CSV
df.to_csv('output.csv', index=False)

# Export to Excel
df.to_excel('output.xlsx', index=False)

# Export to SQL database
df.to_sql('table_name', connection_object, if_exists='replace')
``````

### Explanation:

DataFrames can be exported to various file formats using functions like `to_csv()` for CSV files, `to_excel()` for Excel files, and `to_sql()` to store data in a SQL database. The `index` parameter specifies whether to include the index in the exported file.

## 81. Grouping and Aggregating Data

Grouping data allows you to perform aggregate operations on specific subsets of data.

``````
# Grouping by a single column and calculating mean
grouped_mean = df.groupby('Category')['Value'].mean()

# Grouping by multiple columns and calculating sum
grouped_sum = df.groupby(['Category1', 'Category2'])['Value'].sum()
``````

### Explanation:

The `groupby()` function is used to group the DataFrame based on one or more columns. Aggregate functions like `mean()`, `sum()`, `count()`, etc., can then be applied to the grouped data to calculate summary statistics.

## 82. Reshaping Data

DataFrames can be reshaped using functions like `melt` and `pivot`.

``````
# Melting the DataFrame
melted_df = pd.melt(df, id_vars=['ID'], value_vars=['Value1', 'Value2'])

# Creating a pivot table
pivot_table = df.pivot_table(index='Category', columns='Date', values='Value', aggfunc='sum')
``````

### Explanation:

The `melt()` function is used to transform the DataFrame from wide format to long format. The `pivot_table()` function is used to create a pivot table, aggregating data based on specified rows, columns, and values.

## 83. Combining DataFrames

DataFrames can be combined using functions like `concat`, `merge`, and `join`.

``````
# Concatenating DataFrames vertically
concatenated_df = pd.concat([df1, df2], axis=0)

# Merging DataFrames based on a common column
merged_df = pd.merge(df1, df2, on='ID', how='inner')

# Joining DataFrames based on index
joined_df = df1.join(df2, how='outer')
``````

### Explanation:

The `concat()` function is used to concatenate DataFrames vertically or horizontally. The `merge()` function is used to merge DataFrames based on common columns, and the `join()` function is used to join DataFrames based on index.

## 84. Time Series Analysis

Pandas provides functionality for working with time series data.

``````
# Converting to datetime format
df['Date'] = pd.to_datetime(df['Date'])

# Resampling time series data
resampled_df = df.resample('W').sum()

# Rolling mean calculation
rolling_mean = df['Value'].rolling(window=7).mean()
``````

### Explanation:

Pandas allows you to work with time series data by converting date columns to datetime format, resampling data at different frequencies, and calculating rolling statistics like moving averages.

## 85. Visualizing Data

Data visualization is crucial for understanding patterns and trends in data.

``````
import matplotlib.pyplot as plt
import seaborn as sns

# Line plot
plt.plot(df['Date'], df['Value'])
plt.xlabel('Date')
plt.ylabel('Value')
plt.title('Value over Time')
plt.show()

# Scatter plot
sns.scatterplot(x='X', y='Y', data=df)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Scatter Plot')
plt.show()
``````

### Explanation:

Matplotlib and Seaborn libraries are commonly used for data visualization in Python. You can create various types of plots, including line plots and scatter plots, to visualize relationships and trends in your data.

## 86. Handling Missing Data

Dealing with missing data is essential for accurate analysis.

``````
# Checking for missing values
missing_values = df.isnull().sum()

# Dropping rows with missing values
df_cleaned = df.dropna()

# Filling missing values with a specific value
df_filled = df.fillna(0)
``````

### Explanation:

The `isnull()` function is used to identify missing values in a DataFrame. You can then use `dropna()` to remove rows or columns with missing values, and `fillna()` to replace missing values with a specific value.

## 87. Data Transformation

You can perform various data transformation operations to prepare data for analysis.

``````
# Applying a function to a column
df['Transformed_Column'] = df['Value'].apply(lambda x: x * 2)

# Applying a function element-wise
df_transformed = df.applymap(lambda x: x.upper() if isinstance(x, str) else x)

# Binning data into categories
df['Category'] = pd.cut(df['Value'], bins=[0, 10, 20, 30], labels=['Low', 'Medium', 'High'])
``````

### Explanation:

Data transformation involves modifying, adding, or removing columns in a DataFrame to create new features or prepare data for analysis. You can use functions like `apply()` and `applymap()` to transform data based on custom functions.

## 88. Working with Categorical Data

Categorical data requires special handling to encode and analyze properly.

``````
# Encoding categorical variables
encoded_df = pd.get_dummies(df, columns=['Category'], prefix=['Cat'], drop_first=True)

# Mapping categories to numerical values
category_mapping = {'Low': 0, 'Medium': 1, 'High': 2}
df['Category'] = df['Category'].map(category_mapping)
``````

### Explanation:

Categorical data needs to be transformed into numerical format for analysis. You can use one-hot encoding with `get_dummies()` to create binary columns for each category, or use `map()` to map categories to specific numerical values.

## 89. Data Aggregation and Pivot Tables

Aggregating data and creating pivot tables helps summarize information.

``````
# Creating a pivot table
pivot_table = df.pivot_table(index='Category', columns='Month', values='Value', aggfunc='sum')

# Grouping and aggregating data
grouped = df.groupby('Category')['Value'].agg(['sum', 'mean', 'max'])
``````

### Explanation:

Pivot tables allow you to create multidimensional summaries of data. You can also use the `groupby()` function to group data based on specific columns and then apply aggregate functions to calculate summary statistics.

## 90. Exporting Data

After analysis, you might need to export your DataFrame to different formats.

``````
# Exporting to CSV
df.to_csv('output.csv', index=False)

# Exporting to Excel
df.to_excel('output.xlsx', index=False)

# Exporting to JSON
df.to_json('output.json', orient='records')
``````

### Explanation:

Pandas provides methods to export DataFrames to various file formats, including CSV, Excel, and JSON. You can use the `to_csv()`, `to_excel()`, and `to_json()` functions to save your data.

## 91. Merging DataFrames

Combining data from multiple DataFrames can be useful for analysis.

``````
# Inner join
merged_inner = pd.merge(df1, df2, on='ID', how='inner')

# Left join
merged_left = pd.merge(df1, df2, on='ID', how='left')

# Concatenating DataFrames
concatenated = pd.concat([df1, df2], axis=0)
``````

### Explanation:

You can merge DataFrames using different types of joins (inner, outer, left, right) with the `merge()` function. Use `concat()` to concatenate DataFrames along a specified axis.

## 92. Time Series Analysis

Pandas supports time series analysis and manipulation.

``````
# Converting a column to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Resampling time series data
df_resampled = df.resample('D', on='Date').sum()

# Shifting time series data
df_shifted = df['Value'].shift(1)
``````

### Explanation:

For time series analysis, it's crucial to convert time-related columns to datetime format using `pd.to_datetime()`. You can resample time series data to a different frequency and apply aggregation functions using `resample()`. Shifting data can help in calculating differences between consecutive time periods.

## 93. Plotting Data

Pandas provides built-in methods for data visualization.

``````
# Line plot
df.plot(x='Date', y='Value', kind='line', title='Line Plot')

# Bar plot
df.plot(x='Category', y='Value', kind='bar', title='Bar Plot')

# Histogram
df['Value'].plot(kind='hist', title='Histogram')
``````

### Explanation:

Pandas provides easy-to-use methods for creating various types of plots directly from DataFrames. You can create line plots, bar plots, histograms, and more using the `plot()` function.

## 94. Advanced Indexing and Selection

Pandas offers advanced indexing and selection capabilities.

``````
# Indexing using boolean conditions
filtered_data = df[df['Value'] > 10]

# Indexing using loc and iloc
selected_data = df.loc[df['Category'] == 'High', 'Value']

# Multi-level indexing
multi_indexed = df.set_index(['Category', 'Date'])
``````

### Explanation:

You can use boolean conditions to filter rows that meet specific criteria. The `loc` and `iloc` indexers allow you to select data by label or integer-based location, respectively. Multi-level indexing lets you create hierarchical index structures.

## 95. Handling Duplicate Data

Duplicate data can affect analysis accuracy, so it's important to handle it.

``````
# Checking for duplicates
duplicate_rows = df.duplicated()

# Dropping duplicates
df_deduplicated = df.drop_duplicates()

# Keeping the first occurrence of duplicates
df_first_occurrence = df.drop_duplicates(keep='first')
``````

### Explanation:

Use the `duplicated()` function to identify duplicate rows in a DataFrame. You can then use `drop_duplicates()` to remove duplicate rows, either by dropping all duplicates or keeping only the first occurrence.

## 96. Handling Missing Data

Missing data can be problematic for analysis, so it's important to handle it properly.

``````
# Checking for missing values
missing_values = df.isnull()

# Dropping rows with missing values
df_no_missing = df.dropna()

# Filling missing values with a specific value
df_filled = df.fillna(0)
``````

### Explanation:

The `isnull()` function helps you identify missing values in your DataFrame. You can use `dropna()` to remove rows with missing values or `fillna()` to replace missing values with a specific value.

## 97. Aggregating Data

You can perform aggregation operations to summarize data in various ways.

``````
# Grouping data and calculating mean
grouped_mean = df.groupby('Category')['Value'].mean()

# Grouping data and calculating sum
grouped_sum = df.groupby('Category')['Value'].sum()

# Pivot tables
pivot_table = df.pivot_table(index='Category', columns='Date', values='Value', aggfunc='mean')
``````

### Explanation:

Use the `groupby()` function to group data based on specific columns and perform aggregation functions such as mean, sum, count, etc. Pivot tables allow you to create a table summarizing data based on multiple dimensions.

## 98. Reshaping Data

You can reshape data to fit different formats using Pandas.

``````
# Melting data from wide to long format
melted = pd.melt(df, id_vars=['Category'], value_vars=['Jan', 'Feb', 'Mar'])

# Pivoting data from long to wide format
pivoted = melted.pivot_table(index='Category', columns='variable', values='value')
``````

### Explanation:

The `melt()` function helps you reshape data from wide to long format, where each row represents a unique combination of variables. The `pivot_table()` function can then be used to reshape the long format data back to wide format.

## 99. Working with Text Data

Pandas supports text data manipulation and analysis.

``````
# Extracting substrings
df['First Name'] = df['Full Name'].str.split().str

# Counting occurrences of values
word_count = df['Text Column'].str.split().apply(len)

# Finding and replacing text
df['Text Column'] = df['Text Column'].str.replace('old', 'new')
``````

### Explanation:

Pandas provides methods for working with text data within columns. You can use `str.split()` to split text into substrings, `apply()` to perform operations on each element, and `str.replace()` to find and replace specific text within columns.

## 100. Exporting Data

Exporting data is essential for sharing analysis results.

``````
# Export to CSV
df.to_csv('data.csv', index=False)

# Export to Excel
df.to_excel('data.xlsx', index=False)

# Export to JSON
df.to_json('data.json', orient='records')
``````

### Explanation:

Pandas allows you to export DataFrames to various file formats, including CSV, Excel, and JSON. Use the respective `to_*` functions and specify the file name. Set `index=False` to exclude the index column from the export.

## 101. Working with DateTime Data

Pandas provides tools to work with datetime data efficiently.

``````
# Converting strings to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Extracting year, month, day
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day

# Calculating time differences
df['TimeDiff'] = df['End Time'] - df['Start Time']
``````

### Explanation:

Pandas provides the `to_datetime()` function to convert strings to datetime objects. You can use `dt.year`, `dt.month`, and `dt.day` to extract date components. Calculating time differences becomes straightforward by subtracting datetime columns.

## 102. Merging DataFrames

Combining data from multiple DataFrames can provide valuable insights.

``````
# Merging based on common column
merged_df = pd.merge(df1, df2, on='ID')

# Merging with different column names
merged_df = pd.merge(df1, df2, left_on='ID1', right_on='ID2')

# Merging on multiple columns
merged_df = pd.merge(df1, df2, on=['ID', 'Date'])
``````

### Explanation:

Pandas offers the `merge()` function to combine DataFrames based on shared columns. You can specify the column to merge on using the `on` parameter or different columns using `left_on` and `right_on`. Merging on multiple columns is also possible by passing a list of column names.

## 103. Combining DataFrames

Concatenating DataFrames is useful for combining data vertically or horizontally.

``````
# Concatenating vertically
concatenated_df = pd.concat([df1, df2])

# Concatenating horizontally
concatenated_df = pd.concat([df1, df2], axis=1)
``````

### Explanation:

The `concat()` function allows you to concatenate DataFrames vertically (along rows) or horizontally (along columns). Use `axis=0` for vertical concatenation and `axis=1` for horizontal concatenation.

## 104. Applying Functions to Columns

Applying functions to DataFrame columns can transform or manipulate data.

``````
# Applying a function element-wise
df['New Column'] = df['Column'].apply(lambda x: x * 2)

# Applying a function to multiple columns
df[['Col1', 'Col2']] = df[['Col1', 'Col2']].applymap(lambda x: x.strip())
``````

### Explanation:

You can use the `apply()` function to apply a function element-wise to a column. To apply a function to multiple columns, use `applymap()`. The example demonstrates how to double the values in a column and strip whitespace from multiple columns.

## 105. Categorical Data

Converting data to categorical format can save memory and improve performance.

``````
# Converting to categorical
df['Category'] = df['Category'].astype('category')

# Displaying categories
categories = df['Category'].cat.categories

# Mapping categories to numerical values
df['Category Code'] = df['Category'].cat.codes
``````

### Explanation:

By converting categorical data to the `'category'` type, you can save memory and improve performance. Use the `cat.categories` property to display the unique categories and `cat.codes` to map them to numerical values.

## 106. Handling Missing Data

Dealing with missing data is essential for data analysis and modeling.

``````
# Checking for missing values
missing_values = df.isnull().sum()

# Dropping rows with missing values
df_cleaned = df.dropna()

# Filling missing values with a specific value
df_filled = df.fillna(value=0)
``````

### Explanation:

Use `isnull()` to identify missing values in a DataFrame. The `sum()` function calculates the number of missing values per column. You can drop rows with missing values using `dropna()` or fill missing values with a specific value using `fillna()`.

## 107. Aggregating Data

Aggregating data provides insights into summary statistics.

``````
# Calculating mean, median, and sum
mean_value = df['Column'].mean()
median_value = df['Column'].median()
sum_value = df['Column'].sum()

# Grouping and aggregating
grouped_data = df.groupby('Category')['Value'].sum()
``````

### Explanation:

Aggregating data helps analyze summary statistics. Use `mean()`, `median()`, and `sum()` to calculate these statistics. Grouping data using `groupby()` allows for aggregation based on specific columns.

## 108. Reshaping Data

Reshaping data allows for different representations of the same information.

``````
# Pivoting data
pivot_table = df.pivot_table(index='Date', columns='Category', values='Value', aggfunc='sum')

# Melting data
melted_df = pd.melt(df, id_vars='Date', value_vars=['Col1', 'Col2'], var_name='Category', value_name='Value')
``````

### Explanation:

Pivoting reshapes data by creating a new table with columns based on unique values from another column. The `pivot_table()` function allows for customization of aggregation functions. Melting data converts wide-format data to long-format, making it more suitable for analysis.

## 109. Working with Text Data

Manipulating text data is common in data analysis.

``````
# Extracting substring
df['Substr'] = df['Text'].str[0:5]

# Splitting text into columns
df[['First Name', 'Last Name']] = df['Name'].str.split(expand=True)

# Counting occurrences of a substring
df['Count'] = df['Text'].str.count('pattern')
``````

### Explanation:

Text manipulation is possible using string methods like `str[0:5]` to extract a substring. The `str.split()` function splits text into separate columns. The `str.count()` function counts occurrences of a substring in a column.

## 110. Exporting Data

Exporting data is essential for sharing analysis results.

``````
# Exporting to CSV
df.to_csv('data.csv', index=False)

# Exporting to Excel
df.to_excel('data.xlsx', index=False)

# Exporting to JSON
df.to_json('data.json', orient='records')
``````

### Explanation:

Use `to_csv()` to export data to a CSV file. The `to_excel()` function exports to an Excel file, and `to_json()` exports to a JSON file with various orientations, such as `'records'`.

## 111. Merging DataFrames

Merging data from multiple DataFrames can provide a comprehensive view of the data.

``````
# Inner join
merged_inner = pd.merge(df1, df2, on='common_column', how='inner')

# Left join
merged_left = pd.merge(df1, df2, on='common_column', how='left')

# Concatenating DataFrames
concatenated_df = pd.concat([df1, df2], axis=0)
``````

### Explanation:

Merging DataFrames is useful for combining related data. `pd.merge()` performs inner and left joins on specified columns using the `how` parameter. The `pd.concat()` function concatenates DataFrames along the specified axis.

## 112. Time Series Analysis

Working with time series data requires specialized techniques.

``````
# Converting to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Resampling data
daily_data = df.resample('D', on='Date').sum()

# Shifting data
df['Shifted'] = df['Value'].shift(1)
``````

### Explanation:

Time series analysis involves converting date columns to datetime format using `pd.to_datetime()`. Resampling data using `resample()` aggregates data over specified time intervals. Shifting data using `shift()` offsets data by a specified number of periods.

## 113. Working with Categorical Data

Categorical data can be encoded and analyzed effectively.

``````
# Encoding categorical data
df['Category'] = df['Category'].astype('category')
df['Category_encoded'] = df['Category'].cat.codes

# One-hot encoding
one_hot_encoded = pd.get_dummies(df['Category'], prefix='Category')
``````

### Explanation:

Encode categorical data using `astype('category')` and `cat.codes` to assign unique codes to categories. Use `pd.get_dummies()` for one-hot encoding, creating separate columns for each category.

## 114. Data Visualization with Pandas

Data visualization helps in understanding patterns and trends.

``````
import matplotlib.pyplot as plt

# Line plot
df.plot(x='Date', y='Value', kind='line', title='Line Plot')

# Histogram
df['Value'].plot(kind='hist', bins=10, title='Histogram')

plt.show()
``````

### Explanation:

Data visualization libraries like Matplotlib can be used to create various plots. `df.plot()` generates line plots, and `plot(kind='hist')` creates histograms for numeric data.

## 115. Pivot Tables

Pivot tables help summarize and analyze data from multiple dimensions.

``````
# Creating a pivot table
pivot_table = df.pivot_table(values='Value', index='Category', columns='Date', aggfunc='sum')

# Handling missing values
pivot_table_filled = pivot_table.fillna(0)
``````

### Explanation:

Pivot tables are used to summarize data across multiple dimensions. `pivot_table()` creates a pivot table using specified values, index, columns, and aggregation function. `fillna()` is used to handle missing values by filling them with a specific value.

## 116. Groupby and Aggregation

Grouping data and applying aggregation functions helps in obtaining insights.

``````
# Grouping data
grouped_data = df.groupby('Category')['Value'].sum()

# Multiple aggregations
aggregated_data = df.groupby('Category').agg({'Value': ['sum', 'mean']})
``````

### Explanation:

`groupby()` is used to group data based on specified columns. Aggregation functions like `sum()` and `mean()` can be applied to the grouped data. `agg()` allows performing multiple aggregations on different columns.

## 117. Working with Datetime Index

Datetime index provides flexibility in time-based analysis.

``````
# Setting datetime index
df.set_index('Date', inplace=True)

# Resampling with datetime index
resampled_data = df.resample('M').sum()
``````

### Explanation:

Setting a datetime index using `set_index()` enables time-based analysis. `resample()` with a datetime index can be used to aggregate data over different time periods.

## 118. Handling Outliers

Detecting and handling outliers is crucial for accurate analysis.

``````
# Detecting outliers using z-score
from scipy.stats import zscore
outliers = df[np.abs(zscore(df['Value'])) > 3]

# Handling outliers
df_no_outliers = df[(np.abs(zscore(df['Value'])) < 3)]
``````

### Explanation:

Outliers can be detected using the z-score method from the `scipy.stats` library. Values with z-scores greater than a threshold (e.g., 3) can be considered outliers. Removing outliers helps in obtaining more reliable analysis results.

## 119. Exporting Data

Exporting DataFrame data to various formats is essential for sharing and collaboration.

``````
# Exporting to CSV
df.to_csv('data.csv', index=False)

# Exporting to Excel
df.to_excel('data.xlsx', index=False)

# Exporting to JSON
df.to_json('data.json', orient='records')
``````

### Explanation:

DataFrames can be exported to various formats like CSV, Excel, and JSON using `to_csv()`, `to_excel()`, and `to_json()` methods. Specifying `index=False` excludes the index column from the exported data.

## 120. Merging DataFrames

Merging DataFrames helps in combining data from different sources.

``````
# Inner merge
merged_df_inner = pd.merge(df1, df2, on='Key', how='inner')

# Left merge
merged_df_left = pd.merge(df1, df2, on='Key', how='left')

# Right merge
merged_df_right = pd.merge(df1, df2, on='Key', how='right')

# Outer merge
merged_df_outer = pd.merge(df1, df2, on='Key', how='outer')
``````

### Explanation:

Merging DataFrames using the `pd.merge()` function combines data based on a common key. Different types of merges such as inner, left, right, and outer can be performed based on the requirement.

## 121. Handling Duplicates

Identifying and removing duplicate rows from DataFrames.

``````
# Identifying duplicates
duplicate_rows = df[df.duplicated()]

# Removing duplicates
df_no_duplicates = df.drop_duplicates()
``````

### Explanation:

Duplicate rows can be identified using the `duplicated()` method. Removing duplicates can be done using the `drop_duplicates()` method, which retains the first occurrence of each duplicated row.

## 122. Handling Missing Values

Dealing with missing values is crucial for accurate analysis.

``````
# Checking for missing values
missing_values = df.isnull().sum()

# Dropping rows with missing values
df_no_missing = df.dropna()

# Filling missing values
df_filled = df.fillna(0)
``````

### Explanation:

Missing values can be identified using `isnull()`, and the sum of missing values in each column can be calculated using `sum()`. Rows with missing values can be dropped using `dropna()`, and missing values can be filled using `fillna()`.

## 123. String Operations

Performing string operations on DataFrame columns.

``````
# Changing case
df['Column'] = df['Column'].str.lower()

# Extracting substrings
df['Substring'] = df['Column'].str.extract(r'(\d{3})')
``````

### Explanation:

String operations can be performed using the `str` attribute of DataFrame columns. Changing case, extracting substrings, and applying regular expressions are some common string operations.

## 124. Grouping and Aggregating Data

Grouping data by one or more columns and applying aggregation functions.

``````
# Grouping and summing
grouped_sum = df.groupby('Category')['Value'].sum()

# Grouping and calculating mean
grouped_mean = df.groupby('Category')['Value'].mean()
``````

### Explanation:

Grouping data allows you to perform aggregate calculations on subsets of the data. Common aggregation functions include sum, mean, count, and more. You can use the `groupby()` method to specify the grouping columns and then apply the desired aggregation function.

## 125. Pivoting and Reshaping Data

Pivoting and reshaping data to transform its structure.

``````
# Pivoting data
pivot_table = df.pivot_table(index='Date', columns='Category', values='Value', aggfunc='sum')

# Melting data
melted_df = pd.melt(df, id_vars=['Date'], value_vars=['Category1', 'Category2'])
``````

### Explanation:

Pivoting data reshapes it by converting columns into rows and vice versa. The `pivot_table()` function is used for this purpose. Melting data converts wide format data into long format by stacking multiple columns into a single column using the `melt()` function.

## 126. Time Series Analysis

Analyzing time series data using pandas.

``````
# Converting to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Setting date as index
df.set_index('Date', inplace=True)

# Resampling
monthly_data = df.resample('M').sum()
``````

### Explanation:

Time series data analysis involves working with date and time data. Converting date strings to datetime objects, setting the date column as the index, and resampling data (e.g., aggregating daily data into monthly data) are common operations in time series analysis.

## 127. Plotting Data

Visualizing data using pandas plotting capabilities.

``````
import matplotlib.pyplot as plt

# Line plot
df.plot(kind='line', x='Date', y='Value', title='Line Plot')

# Bar plot
df.plot(kind='bar', x='Category', y='Value', title='Bar Plot')
``````

### Explanation:

Pandas provides built-in plotting capabilities for visualizing data. Different types of plots, such as line plots, bar plots, histograms, and more, can be created using the `plot()` function. Matplotlib is commonly used as the backend for pandas plotting.

## 128. Handling Missing Data

Dealing with missing data in pandas DataFrames.

``````
# Checking for missing values
missing_values = df.isnull().sum()

# Dropping rows with missing values
df_cleaned = df.dropna()

# Filling missing values
df_filled = df.fillna(0)
``````

### Explanation:

Handling missing data is crucial in data analysis. You can check for missing values using the `isnull()` function and then use methods like `dropna()` to remove rows with missing values or `fillna()` to fill missing values with a specified value.

## 129. Merging DataFrames

Combining multiple DataFrames using merge and join operations.

``````
# Merging based on a common column
merged_df = pd.merge(df1, df2, on='ID')

# Inner join
inner_join_df = df1.merge(df2, on='ID', how='inner')

# Outer join
outer_join_df = df1.merge(df2, on='ID', how='outer')
``````

### Explanation:

Merging DataFrames involves combining them based on common columns. The `merge()` function can be used to perform different types of joins, such as inner join, outer join, left join, and right join. The `how` parameter specifies the type of join to perform.

## 130. Combining DataFrames

Concatenating multiple DataFrames vertically or horizontally.

``````
# Concatenating vertically
concatenated_df = pd.concat([df1, df2])

# Concatenating horizontally
concatenated_horizontal_df = pd.concat([df1, df2], axis=1)
``````

### Explanation:

Combining DataFrames involves stacking them vertically or horizontally. The `concat()` function is used for this purpose. When concatenating horizontally, the `axis` parameter should be set to 1.

## 131. Grouping and Aggregation

Performing group-wise analysis and aggregations on DataFrame columns.

``````
# Grouping by a column and calculating mean
grouped_df = df.groupby('Category')['Value'].mean()

# Applying multiple aggregations
aggregated_df = df.groupby('Category')['Value'].agg(['mean', 'sum', 'count'])
``````

### Explanation:

Grouping and aggregation are commonly used for summarizing data based on different categories. The `groupby()` function is used to group the DataFrame based on a specified column, and then aggregation functions like `mean()`, `sum()`, and `count()` can be applied to calculate statistics for each group.

## 132. Pivoting and Reshaping

Reshaping DataFrames using pivot tables and stacking/unstacking.

``````
# Creating a pivot table
pivot_table = df.pivot_table(index='Date', columns='Category', values='Value', aggfunc='sum')

# Stacking and unstacking
stacked_df = pivot_table.stack()
unstacked_df = stacked_df.unstack()
``````

### Explanation:

Pivot tables allow you to reshape your data by providing a new structure. The `pivot_table()` function creates a new DataFrame with rows as index, columns as columns, and values based on aggregation functions. Stacking and unstacking are used to reshape multi-level index DataFrames into a single-level index or vice versa.

## 133. Time Series Analysis

Working with time series data and performing time-based operations.

``````
# Converting column to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Setting datetime column as index
df.set_index('Date', inplace=True)

# Resampling time series data
resampled_df = df.resample('W').mean()
``````

### Explanation:

Time series data involves working with data that is indexed by time. You can convert a column to a datetime format using `pd.to_datetime()` and then set it as the index of the DataFrame using `set_index()`. Resampling allows you to aggregate data based on a specified time frequency (e.g., weekly) using the `resample()` function.

## 134. Working with Text Data

Performing text-based operations on DataFrame columns.

``````
# Converting column to uppercase
df['Name'] = df['Name'].str.upper()

# Extracting text using regular expressions
df['Digits'] = df['Text'].str.extract(r'(\d+)')

# Counting occurrences of a substring
df['Count'] = df['Text'].str.count('apple')
``````

### Explanation:

Text-based operations involve manipulating and extracting information from text data in DataFrame columns. You can convert text to uppercase using `str.upper()`, extract specific patterns using regular expressions and `str.extract()`, and count occurrences of substrings using `str.count()`.

## 135. Working with Categorical Data

Dealing with categorical data and performing categorical operations.

``````
# Converting column to categorical
df['Category'] = df['Category'].astype('category')

# Creating dummy variables
dummy_df = pd.get_dummies(df['Category'])

# Merging with original DataFrame
df = pd.concat([df, dummy_df], axis=1)
``````

### Explanation:

Categorical data represents data that belongs to a specific category. You can convert a column to a categorical type using `astype()` and create dummy variables using `pd.get_dummies()`. Dummy variables are used to represent categorical variables as binary columns, which can then be merged with the original DataFrame.