Datasets Features

Perform a preliminary analysis before modelling & visualising the data’s output. Each of these views will be computed automatically, and readily available for when you select ‘explore dataset’. This acts as a pre-processing step to ensure that the data is as we expect it to be (structured & formattable), and spot a mistake in our data collection methods.

Exploratory Analysis

The correlation analysis gives a straightforward overview of the relationship between the dataset’s categories & variables so that you can identify any outliers or unexpected results early on.

Column profiling provides a detailed statistical overview of each column’s core statistics such as the count, min, max, standard deviation etc.

Distribution curves display how the values are spread out. This is also an opportunity to identify any incorrectly aggregated categories/values before progressing to the next stage of your analysis.

Advanced Explorer

When taking your initial analysis a step further, the Advanced Explorer provides the ability to break down your data into tables of values which are aggregations of values from a larger table, such as the sales performance of a particular category of product from a businesses’ sales ledger. In short, Advanced Explorer enables you to summarise large amounts of data, and compare the values via a compatible chart.

To begin creating a table, select the top right button of the table with the cog.
Select the variables & categories you wish to break down. Note - the system will automatically place each field in the correctly corresponding box (e.g. price will be assigned to the values box). Subcategories will expand upon clicking the ‘+’ next to the column title.
Select your chart or graph type with the top left button on the banner.
Export & download the table or chart via your preferred method through the arrow button on the top banner.

Note: Bear in mind that the Advanced Explorer has a current limit of up to 100,000 rows of data.

Table view - In the Table view, we can observe the raw data as it currently sits within the dataset, identifying individual rows, columns, categories and column stats.

1. Identify key column stats by hovering the mouse over the category title at the top of the page. If the column contains a string of data (non-numerical) such as a product type or name, there will be a breakdown of the column count (how many rows of data), and the distinct count (how many exclusive options there are for that column of data to contain).

If the column contains numerical data such as prices or sales, the breakdown will include a minimum & maximum value number, a mean average and a count and a distinct count value.

2. Search the dataset by using the simple boolean search bar at the top of the page. Begin by selecting the column that we wish to search, in the left-most box. Next, we select the operator we would like (equals, contains etc.) and enter the string or value that we are seeking to find/exclude in the right-most box, before selecting search.