From the course: Complete Guide to AI and Data Science for SQL Developers: From Beginner to Advanced

Unlock the full course today

Join today to access over 23,200 courses taught by industry experts.

Checking the distribution of the variables

Checking the distribution of the variables

- [Instructor] Now that you've explored the summary statistics of your dataset in the last step, in this step, you are going to visualize your data to gain a deeper understanding of its distribution. Data visualization is like putting on special glasses that allows you to see patterns and insights in your data. To do this, you'll be using Python Library's Matplotlib and Seaborn. Let's dive right in. You see this code? When you run it, you're creating histograms for each of your columns, and each histogram represents the distribution of a specific attribute. Here's an example. Take a look at the crime rate column. The distribution of crime rate appears to be highly skewed to the right with a mean of 3.61 and a maximum value of 88.98. When you look at the histogram, you'll see that most houses have a crime rate below 20 and there are fewer houses with higher crime rates. This means that the majority of houses in the…

Contents