Review of cluster analysis methods and assessment of their applicability for solving the problem of consumer market segmentation. Methods and tools for consumer segmentation Cluster segmentation method

I work in the email marketing industry for a site called MailChimp.com. We help clients create newsletters for their advertising audience. Every time someone calls our work "postal stuffing", I feel an unpleasant chill in my heart.

Why? Yes, because the addresses Email- no longer black boxes that you throw messages like grenades. No, in email marketing (as with other forms of online contact, including tweets, Facebook posts, and Pinterest campaigns), the business gains insight into how the audience engages on an individual level through click tracking, online ordering, the spread of statuses on social networks, etc. This data is not just a hindrance. They characterize your audience. But for the uninitiated, these operations are akin to the wisdom of the Greek language. Or Esperanto.

How do you collect data about transactions with your customers (users, subscribers, etc.) and do you use that data to better understand your audience? When you deal with many people, it can be difficult to study each client individually, especially if they all communicate with you differently. Even if theoretically you could reach out to everyone personally, in practice this is hardly feasible.

You need to take your customer base and find a middle ground between random bombing and personalized marketing for each individual customer. One way to achieve this balance is to use clustering to segment your customers' market so that you can target different segments of your customer base with different targeted content, offerings, and more.

Cluster Analysis is a collection of various objects and dividing them into groups of their own kind. By working with these groups - by identifying what their members have in common and what makes them different - you can learn a lot about the messy data you have. This knowledge will help you make the best decisions, and at a more detailed level than before.

In this context, clustering is called exploratory data mining, because these techniques help to "pull" information about relationships in huge datasets that cannot be captured visually. And the discovery of connections in social groups useful in any industry - for movie recommendations based on habits target audience, to identify the criminal centers of the city or justify financial investments.

One of my favorite uses for clustering is image clustering: heaping up image files that "look the same" to the computer. For example, in image hosting services like Flickr, users produce a lot of content and simple navigation becomes impossible due to the large number of photos. But using clustering techniques, you can combine similar images, allowing the user to navigate between these groups before sorting them in detail.

Supervised or Unsupervised Machine Learning?

In exploratory data mining, by definition, you don't know ahead of time what kind of data you are looking for. You are a researcher. You can clearly explain when two customers look similar and when they look different, but you don't know the best way to segment your customer base. Therefore, "asking" the computer to segment the client base for you is called unsupervised machine learning, because you do not control anything — you do not dictate to the computer how to do its work.

In contrast to this process, there is supervised machine learning, which usually occurs when artificial intelligence hits the front page. If I know that I want to divide customers into two groups - say, "most likely to buy" and "unlikely to buy" - and supply the computer with historical examples of such customers, applying all the innovations to one of these groups, then that is control.

If instead I say, “This is what I know about my clients and this is how to tell if they are different or the same. Tell me something interesting, ”- this is a lack of control.

This chapter discusses the simplest clustering method called the k-means method, which dates back to the 50s and has since become a DOB (DKD) duty in all industries and governments.

The k-means method is not the most mathematically accurate of all methods. It was created primarily for reasons of practicality and common sense - like African American cuisine. She does not have such a chic pedigree as the French, but she often caters to our gastronomic whims. K-Means Cluster Analysis, as you will soon see, is part math and part history (past company history, if the comparison relates to management learning methods). Its undoubted advantage is its intuitive simplicity.

Let's see how this method works with a simple example.

Girls dance with girls, boys scratch their heads

The goal of k-means clustering is to select multiple points in space and turn them into k groups (where k is any number you choose). Each group is identified by a point in the center like a flag stuck in the moon and signaling, “Hey, this is the center of my group! Join if you are closer to this flag than to the others! " This center of the group (with the official name cluster centroid) is the middle of the name of the k-means method.

Let's remember, for example, school dances. If you managed to erase the horror of this "entertainment" from your memory, I am very sorry for bringing back such painful memories.

The heroes of our example are students high school Macacne, who came to the dance evening under the romantic title "Ball at the Bottom of the Sea," are scattered around the assembly hall, as shown in fig. 1. I even painted on the parquet in Photoshop to make it easier to imagine the situation.

Rice. 1. Macacne High School students settled in the assembly hall

Here are some examples of songs that these young leaders of the free world will dance to awkwardly (in case you suddenly want music, for example, on Spotify):

  • Styx: Come Sail Away
  • Everything But the Girl: Missing
  • Ace of Base: All that She Wants
  • Soft Cell: Tainted Love
  • Montell Jordan: This is How We Do It
  • Eiffel 65: Blue

Now clustering by k-means depends on the number of clusters by which you want to divide the audience. Let's start with three clusters (we'll look at the choice of k later in this chapter). The algorithm places three flags on the floor of the assembly hall in some acceptable way, as shown in Fig. 2, where you see 3 start flags distributed across the floor and marked with black circles.

Rice. 2. Placement of initial cluster centers

In k-means clustering, dancers are tied to their nearest cluster center so that a demarcation line can be drawn between any two centers on the floor. Thus, if a dancer is on one side of the line, he belongs to one group, if on the other side, then to the other (as in Fig. 3).

Rice. 3. Lines mark the boundaries of clusters

Using these demarcation lines, divide the dancers into groups and color them accordingly, as in fig. 4. This diagram, dividing space into polygons determined by the proximity to a particular cluster center, is called a Voronoi diagram.

Rice. 4. Grouping by clusters marked with different background patterns in a Voronoi diagram

Let's take a look at our initial division. Something's wrong, isn't it? The space is divided in a rather strange way: the lower left group remains empty, while on the border of the upper right group, on the contrary, there are many people.

The k-means clustering algorithm moves cluster centers across the floor until it achieves the best result.

How to determine the "best result"? Each person present is at some distance from their cluster center. The shorter the average distance from the participants to the center of their group, the better the result.

Now we introduce the word "minimization" - it will be very useful for you in optimizing the model for a better location of cluster centers. In this chapter, you will force Solver to move cluster centers countless times. The way Solution Seeker uses to find the best location for cluster centers is to slowly iteratively move them across the surface, fixing the best results found and combining them (literally mating like racehorses) to find the best position.

So if the diagram in fig. 4 looks rather pale, "Finding a solution" can suddenly arrange centers like in fig. 5. Thus, the average distance between each dancer and its center will decrease slightly.

Rice. 5. Shifting the centers slightly

Obviously, sooner or later, "Finding a Solution" will realize that centers should be placed in the middle of each group of dancers, as shown in fig. 6.

Rice. 6. Optimal clustering at school dances

Fine! This is what ideal clustering looks like. Cluster centers are located in the center of each group of dancers, minimizing the average distance between the dancer and the nearest center. Now that the clustering is complete, it's time to move on to the fun part, which is trying to understand what these clusters mean.

If you know the color of the dancers' hair, their political preferences, or when they crossed the 100 meter distance, then clustering does not make much sense.

But once you decide to determine the age and gender of those present, you will begin to see some general trends. The small group below are elderly people, most likely accompanying people. The group on the left is all boys, and the group on the right is all girls. And everyone is very afraid to dance with each other.

Thus, the k-means method allowed you to divide many dance attendees into groups and correlate the characteristics of each attendee with belonging to a particular cluster to understand the reason for the split.

Now you are probably saying to yourself: “Come on, what nonsense. I already knew the answer before the start. " You're right. In this example, yes. I gave such a "toy" example on purpose, being sure that you can solve it just by looking at the dots. The action takes place in a two-dimensional space, in which clustering is performed elementarily with the help of the eyes.

But what if you run a store that sells thousands of products? Some shoppers have made one or two purchases in the past two years. Others are dozens. And everyone bought something of their own.

How do you cluster them on such a "dance floor"? To begin with, this dance floor is not two-dimensional, and not even three-dimensional. This is a thousand-dimensional space for the sale of goods, in which the buyer purchased or did not purchase the goods in each dimension. You see how quickly the problem of clustering starts to go beyond the capabilities of the "first grade eyeball," as my military friends like to say.

Real life: k-means clustering in email marketing

Let's move on to a more substantive case. I work in email marketing, so here's an example from the life of Mailchimp.com where I work. The same example will work for data on retail, converting ad traffic, social networks etc. It interacts with almost any type of data related to customer reporting advertising material, after which they unconditionally choose you.

Joey Bag O "Donuts' Wholesale Wine Empire

Imagine for a moment that you live in New Jersey, where you run Joey Bag O "Donuts' Wholesale Wine Empire. It is an import and export business that aims to ship huge quantities of wine from overseas and sell it to designated liquor stores across the country. This business works in such a way that Joey travels all over the world in search of incredible deals with large quantities of wine. He sends it to his Jersey, and it is your concern to attach the sent to the shops and make a profit.

You find buyers in different ways: Facebook page, Twitter account, sometimes even direct mail - after all emails"Spin up" most types of business. Last year, you sent one email a month. Typically, each letter describes two or three transactions, say, one with champagne and the other with malbec. Some of the deals are simply amazing - the discount is 80% or more. As a result, you closed about 32 deals in a year and they all went more or less smoothly.

But just because things are just going well doesn't mean they can't go better. It would be useful to understand the motives of your customers a little deeper. Of course, looking at a specific order, you see that a certain Adams bought some sparkling wine in July with a 50% discount, but you cannot determine what prompted him to buy. Did he like the minimum order quantity of one box of six bottles, or the price that has not yet climbed to its maximum?

It would be nice to be able to split your client list into interest groups. Then you could edit the letters to each group separately and, perhaps, promote the business even more. Any deal suitable for this group could become the subject of the letter and go in the first paragraph of the text. This type of targeted mailing can create a huge sales explosion!

It is possible to let the computer do the job for you. Using k-means clustering you can find best option splitting into groups and then trying to figure out why he is the best.

Original dataset

The Excel document that we will be analyzing in this chapter is located on the workbook website. It contains all the raw data in case you want to work with it. Or you can just follow the text by peeking at the rest of the document.

First, you have two interesting data sources:

  • metadata for each order is stored in a spreadsheet, including variety, minimum order quantity, discount for retail sale, information about whether the price maximum has been passed, and about the country of origin. This data is located in a tab called OfferInformation, as shown in Fig. 7;
  • knowing which customer is ordering what, you can shake that information out of MailChimp and feed the offer metadata spreadsheet in the Transactions tab. These are variable data presented as shown in Fig. 8, very simple: the buyer and his order.

Rice. 7. Details of the last 32 orders

Rice. eight. Order quantity list by customer

Determine the subject of measurements

And here's the challenge. In the problem of school dancing, measuring the distance between those present and identifying cluster centers was not difficult, right? Finding the right tape measure is enough! But what to do now?

You know that last year there were 32 offers of deals and you have a list of 324 orders in a separate tab, broken down by buyer. But to measure the distance from each customer to the cluster center, you have to place them in this 32-trade space. In other words, you need to understand what transactions they did not complete, and create a matrix of transactions by buyer, in which each client receives his own column with 32 cells of transactions, filled with ones if the transactions were made, and zeros if not.

In other words, you need to take this row-oriented table of deals and turn it into a matrix in which customers are arranged vertically and offers are arranged horizontally. In the best way its create are pivot tables.

Algorithm of action: on the sheet with variable data, select columns A and B, and then insert a pivot table. Using the PivotTable Wizard, simply select Deals as the Row Header and Buyers as the Column Header and fill out the table. The cell will contain 1 if the client-deal pair exists, and 0 if not (in this case, 0 is displayed as an empty cell). The result is the table shown in Fig. nine.

Rice. nine. Pivot table "client-deal"

Now that you have your order information in a matrix format, copy the OfferInformation sheet and name it Matrix. In this new sheet, paste the values ​​from the pivot table (no need to copy and paste the deal number, because it is already contained in the order information), starting from column H. As a result, you should have an extended version of the matrix, supplemented with order information, as in fig. ten.

Rice. ten. Description of transactions and order data merged into a single matrix

Data standardization

In this chapter, each dimension of your data is presented in the same way as binary order information. But in many situations related to clustering, we cannot do this. Imagine a scenario where people are clustered by height, weight, and salary. All three types of data have different dimensions. Height can vary from 1.5 to 2 meters, while weight is from 50 to 150 kg.

In this context, measuring the distance between customers (as between dancers in an auditorium) becomes a confusing affair. Therefore, it is customary to standardize each column of data by subtracting the mean and then dividing one by one by a measure of variation called the standard deviation. Thus, all columns are reduced to a single value, varying quantitatively around 0.

Let's start with four clusters

Well, now all your data has been reduced to a single convenient format. To start clustering, you need to choose k - the number of clusters in the k-means algorithm. Often the k-means method is applied like this: take a set of different k and test them one at a time (I will explain how to choose them later), but we are just getting started - so we will choose only one.

You will need a number of clusters that are roughly suitable for what you want to do. You clearly do not intend to create 50 clusters and send 50 targeted promotional emails to a couple of guys in each group. This instantly makes our exercise meaningless. In our case, we need something small. Start this example with 4 - in an ideal world, you would probably divide your list of clients into 4 understandable groups of 25 people each (which in reality is unlikely).

So, if you have to divide your customers into 4 groups, what is the best way to match them?

Instead of messing up the pretty Matrix sheet, copy the data into a new sheet and name it 4MC. You can now insert 4 columns after the high in columns H to K, which will be the cluster centers. (To insert a column, right-click on column H and select “Paste.” The column will appear on the left.) Name these clusters Cluster 1 to Cluster 4. You can also apply conditional formatting to them, and whenever you set them, you will be able to see how different they are.

Sheet 4MC will appear as shown in fig. eleven.

Rice. eleven. Empty cluster centers placed on a 4MC sheet

In this case, all cluster centers are zeros. But technically, they can be anything and, what you will especially like - as at school dances, they are distributed in such a way that they minimize the distance between each customer and his cluster center.

Obviously, then these centers will have values ​​from 0 to 1 for each trade, since all client vectors are binary.

But what does it mean to “measure the distance between the cluster center and the customer”?

Euclidean Distance: Measuring Distances Off-Road

For each customer, you have a separate column. How to measure the distance between them? In geometry, this is called the "shortest path", and the resulting distance is called the Euclidean distance.

Let's go back to the assembly hall for a while and try to figure out how to solve our problem there.

Place the coordinate axes on the floor and in fig. 12 we will see that at point (8.2) we have a dancer, and at (4.4) - a cluster center. To calculate the Euclidean distance between them, you will have to recall the Pythagorean theorem, with which you are familiar from school.

Rice. 12. Dancer at point (8.2) and cluster center at (4.4)

These two points are 8 - 4 = 4 meters apart vertically and 4 - 2 = 2 meters horizontally. According to the Pythagorean theorem, the square of the distance between two points is 4L2 + 2L2 = 20 meters. From here we calculate the distance itself, which will be equal to the square root of 20, which is approximately 4.47 m (as in Fig. 13).

Rice. 13. Euclidean distance equals the square root of the sum of the distances in each direction

In the context of newsletter subscribers, you have more than two dimensions, but the same concept applies. The distance between the customer and the cluster center is calculated by determining the differences between the two points for each trade, squaring them, adding and taking the square root. For example, on sheet 4MC, you want to know the Euclidean distance between the center of cluster 1 in column H and the orders of customer Adams in column L.

In cell L34, under the orders of Adams, you can calculate the difference between the Adams vector and the cluster center, square it, add and then root using the following formula for arrays (check the absolute references that allow you to drag this formula to the right or down without changing the reference to the cluster center):


(= ROOT (SUM (L $ 2: L $ 33- $ H $ 2: $ H $ 33) A2)))

The array formula (enter the formula and press Ctrl + Shift + Enter or Cmd + Return on MacOS, as described in Chapter 1) must be used because the (L2: L33-H2: H33) ^ 2 part of it must "know" where contact to calculate the differences and square them, step by step. However, the final result is a single number, in our case 1.732 (as in Figure 14). It has the following meaning: Adams made three trades, but since the initial cluster centers are zero, the answer will be equal to the square root of 3, namely 1.732.

Rice. fourteen. Distance between the center of 1 cluster and Adams

In the spreadsheet in Fig. 2-14, I anchored the top row (see Chapter 1) between columns G and H and named row 34 in cell G34 Distance to Cluster 1, just to see where it is as I scroll down the page.

Distances and cluster membership for everyone!

Now you know how to calculate the distance between the order vector and the cluster center.

It's time to add Adams' distance calculation to the rest of the cluster centers by dragging cell L34 down to L37 and then manually changing the cluster center reference from column H to column I, J, and K in the cells below. The result should be the following 4 formulas in L34: L37:

(= SQRT (SUM ((L $ 2: L $ 33- $ H $ 2: $ H $ 33) A2)))
(= SQRT (SUM ((L $ 2: L $ 33- $ I $ 2: $ I $ 33) A2)))
(= SQRT (SUM ((L $ 2: L $ 33- $ J $ 2: $ J $ 33) A2)))
(= SQRT (SUM ((L $ 2: L $ 33- $ K $ 2: $ K $ 33) A2)))
(= ROOT (SUM ((L $ 2: L $ 33- $ H $ 2: $ H $ 33) A2)))
(= ROOT (SUM ((L $ 2: L $ 33- $ I $ 2: $ I $ 33) A2)))
(= ROOT (SUM ((L $ 2: L $ 33- $ J $ 2: $ J $ 33) A2)))
(= ROOT (SUM ((L $ 2: L $ 33- $ K $ 2: $ K $ 33) A2)))

Since you used absolute references for the cluster centers (which is what the $ in the formulas stands for, as discussed in Chapter 1), you can drag L34: L37 to DG34: DG37 to calculate the distance from each customer to all four cluster centers. Head the rows in column G in cells 35 through 37 “Distance to Cluster 2,” and so on. The newly calculated distances are shown in Fig. 15.

Rice. 15. Calculation of distances from each customer to all cluster centers

Now you know the distance of each client to all four cluster centers. Their distribution into clusters is made according to the shortest distance in two steps as follows.

First, go back to Adams in column L and calculate the minimum distance to the cluster center in cell L38. It's simple:

Min (L34: L37)
= min (L34: L37)

For calculation we use the formula match / searchposes (more in chapter 1). By placing it in L39, you can see the cell number from the interval L34: L37 (I count each in order from 1), which is at the minimum distance:

Match (L38, L34: L37.0) = search position (L38, L34: L37.0)

In this case, the distance is the same for all four clusters, so the formula chooses the first (L34) and returns 1 (Figure 16).

Rice. 16. Adding Cluster Bindings to a Sheet

You can also drag and drop these two formulas onto DG38: DG39. To keep things organized, add the names of lines 38 and 39 to cells 38 and 39 in Column G “Minimum Cluster Distance” and “Assigned Cluster”.

Finding solutions for cluster centers

Your spreadsheet replenished with the calculation of distances and binding to clusters. Now, to determine the best position of the cluster centers, you need to find those values ​​in the columns from H to K that minimize the total distance between customers and the cluster centers to which they are linked, indicated in row 39 for each customer.

When you hear the word "minimize": the optimization stage begins, and the optimization is carried out using the "Search for a solution".

To use Solution Finder, you need a results cell, so in A36 we summarize all the distances between customers and their cluster centers:

SUM (L38: DG38)
= CUMMA (L3 8: DG3 8)

This sum of distances from clients to the cluster centers closest to them is exactly the objective function that we encountered earlier when clustering the assembly hall of Macacne High School. But Euclidean distance, with its powers and square roots, is a monstrously non-linear function, so you have to use an evolutionary solution instead of the simplex method.

You have already used this method in Chapter 1. The simplex algorithm, if it can be applied, works faster than others, but it cannot be used to calculate roots, squares, and other nonlinear functions. Likewise, OpenSolver is useless, which uses a simplex algorithm, even if it seems to have taken steroids.

In our case, the evolutionary algorithm built into Search for a Solution uses a combination of random search and an excellent crossover solution to, like evolution in a biological context, find effective solutions.

You have everything you need to set the problem before the "Search for a solution":

  • goal: to minimize the overall distance from buyers to their cluster centers (A36);
  • variables: the vector of each transaction relative to the cluster center (Н2: К33);
  • conditions: cluster centers must have values ​​between 0 and 1.

It is recommended to have a “Search for a solution” and a hammer. We set the task "Search for a solution": to minimize A36 by changing the values ​​of H2: K33 with the condition H2: K33<=1, как и все векторы сделок. Убедитесь, что переменные отмечены как положительные и выбран эволюционный алгоритм (рис. 17).

Rice. 17. Solution Finder Settings for 4-Center Clustering

But the problem statement is not everything. You will have to sweat a little, choosing the necessary options for the evolutionary algorithm, clicking the "Options" button in the "Search for a solution" window and going to the settings window. I advise you to set the maximum time for 30 seconds more, depending on how long you are willing to wait for the "Search for solutions" to cope with its task. In fig. On 18 I put mine on 600 seconds (10 minutes). That way, I can start Searching for a Solution and go to lunch. And if you want to interrupt him early, just press Escape and exit with the best solution he has found.

Rice. eighteen. Evolutionary algorithm parameters

Click Run and watch Excel do its job until the evolutionary algorithm converges.

The meaning of the results

Once Solvement finds you the best cluster centers, the fun begins. Let's move on to studying groups! In fig. 19 we can see that Solver found the optimal total distance of 140.7, and all four cluster centers - thanks to conditional formatting! - look completely different.

Rice. 19. Four optimal cluster centers

Keep in mind that your cluster centers may differ from those presented in the book, because the evolutionary algorithm uses random numbers and the answer is different each time. The clusters can be completely different, or more likely in a different order (for example, my cluster 1 may be very close to your cluster 4, etc.).

Since when you created the sheet, you inserted descriptions of transactions into columns from B to G, now you can read the details in Fig. 19, which is important for understanding the idea of ​​cluster centers.

For cluster 1 in column H, conditional formatting selects trades 24, 26, 17 and, to a lesser extent, 2. After reading the description of these trades, you can understand what they have in common: they were all concluded in pinot noir.

Looking at column I, you can see that all the green cells have a low minimum. These are buyers who do not want to purchase huge quantities in the course of the transaction.

But the other two cluster centers, frankly, are difficult to interpret. Instead of interpreting cluster centers, how about examining the buyers in the cluster and determining which deals they like? This could clarify the issue.

Cluster rating of deals

Instead of figuring out which distances to which cluster center are closer to 1, let's check who is tied to which cluster and which deals they prefer.

To do this, start by copying the OfferInformation sheet. Let's name the copy 4MC - TopDealsByCluster. Number columns H through K on this new sheet from 1 to 4 (as in Figure 20).

Rice. twenty. Creating a worksheet for calculating the popularity of deals using clusters

On sheet 4MC, you had cluster bindings from 1 to 4 on line 39. All you need to do to count transactions by cluster is to look at the names of the columns from H to K on sheet 4MC - TopDealsByCluster, see which of sheet 4MC was tied to this cluster on line 39, and then add the number of their transactions in each line. Thus, we will get the total number of buyers in this cluster who have completed transactions.

Let's start with cell H2, which contains the number of customers in cluster 1 who accepted offer # 1, namely the January malbec. It is necessary to add the values ​​of the cells of the L2: DG2 range on the 4MC sheet, but only buyers from 1 cluster, which is a classic example of using the sumif / summes formula. It looks like this:

SUMIF ("4MC"! $ L $ 39: $ DG $ 39, "4MC - TopDealsByCluster"! H $ 1, "4MC"! $ L2: $ DG2)
= CyMMEOra ("4MC"! $ L $ 39: $ DG $ 39, "4MC - TopDealsByCluster"! H $ 1, "4MC"! $ L2: $ DG2)

This formula works like this: you supply it with some conditionals, which it checks in the first part of "4MC"! $ L $ 39: $ DG $ 39, "4MC, then compares to 1 in the column header (" 4MC - TopDealsByCluster "! H $ 1 ), and then on each match, adds that value to line 2 in the third part of the "4MC" formula! $ L2: $ DG2.

Notice that you used absolute references ($ in the formula) before everything related to cluster binding, the row number in the column headings, and the letter denoting the column for the trades you made. Having made these links absolute, you can drag the formula anywhere from H2: K33 to calculate the number of deals for other cluster centers and combinations of deals, as in Fig. 21. To make these columns more readable, you can also apply conditional formatting to them.

Rice. 21. Total number of deals for each offer, broken down by clusters

By highlighting columns A through K and applying auto filtering, you can sort that data. By sorting column H from smallest to largest, you will see which trades are most popular in cluster 1 (Figure 22).

Rice. 22. Sorting cluster 1. Pinot, pinot, pinot!

As I mentioned earlier, the four largest trades for this cluster are Pinos. These guys are clearly abusing the movie Sideways. If you sort out cluster 2, it becomes abundantly clear that these are small wholesalers (Figure 23).

But when you sort cluster 3, it’s not so easy to understand. Large deals can be counted on one hand, and the difference between them and the rest is not so obvious. However, the most popular deals do have something in common - pretty good discounts, 5 of the 6 largest deals are on sparkling wine, and France is the product manufacturer for 3 of 4 of them. However, these assumptions are controversial.

As for cluster 4, these guys, for some reason, clearly liked the August offer for champagne. Also, 5 of the 6 largest deals are for French wine, and 9 of the 10 largest deals are for high volume. Maybe this is a large wholesale cluster tending to French wines? The intersection of clusters 3 and 4 is also troubling.

Further, we consider the segmentation of students by subjective characteristics (see Section 14.1) and by the benefits (see Section 14.4) that receive higher education in full-time education. For segmentation, a technique is used based on cluster analysis with the use of multidimensional scaling for additional, more complete analysis.

Segmentation variables- properties and benefits - should be quantitatively scored. When solving a specific problem, nine parameters were used. For the application of the Likert scale for each parameter, the corresponding statements are formulated.

  • 1. This is the best way to gain deep knowledge.
  • 2. This is an opportunity for full-fledged communication and making friends.
  • 3. This is a valuable opportunity to communicate with the teacher.
  • 4. This is an important step in starting a career.
  • 5. Studentship is a wonderful period in life.
  • 6. The material costs of day training are high.
  • 7. The time spent on daytime education is high.
  • 8. Develops thinking in the specialty.
  • 9. Daytime education is prestigious.

The set of parameters that can be used can be much wider. Students in their questionnaires also often indicate the following advantages or disadvantages of daytime studies at the university: the possibility of broadening their horizons, the possibility of postponement, the opportunity to learn self-discipline and self-organization, difficulty in combining study and work, an important period in life, lack of practice, the ability to obtain a large amount of information, influence for further advancement in work, the emergence in the future of the opportunity to determine the correct choice of profession, participation in the life of the university.

Data collection

Data collection is carried out by a questionnaire method. The questions are formulated using the Likert scale (see subsection 8.3). For example, students were asked about the degree of their agreement-disagreement with statements on a scale of five gradations. In the literature, a seven-point scale is widely used, but often the respondent finds it difficult to give answers with a large number of gradations.

A fragment of the questionnaire has the form shown in Fig. 24.2.

Rice. 24.2.

The respondent is only required to tick the box, and the digitization is carried out by the questionnaire. A five-point scale with levels from 1 to 5 was applied (1 - strongly disagree, ..., 5 - strongly agree). 19 respondents answered the questionnaire - all students of the same group, which, of course, is not enough.

24.7. Segmentation by properties on the example of an educational product 381

Calculations by the method of cluster analysis

Cluster analysis (see Section 23.7) is widely used in segmentation by product properties (see Section 24.3). Cluster analysis segmentation is sometimes called hierarchical. On the basis of the received grades, the distances between the grades of each student with each are calculated. Based on the scientific statistical software package Statistica. First, a matrix of Euclidean distances is compiled (euclidean distances). For the formation of clusters, a unifying (agglomerative) procedure was applied according to the far-neighbor method (complete linkage). The results are presented in the form of a diagram in Fig. 24.3.

Rice. 24.3. Dendrogram (PPP Statistica)

The vertical axis is the Linkage Distance. On the horizontal axis, students are listed with numbers from C_1 to C19. As follows from the dendrogram, there are 19 clusters at the first step. The first and second steps combine points 3 with 5 and 9 with 11. In the third step, points 8 and 13 are combined. Then the merging process continues.

When choosing the final step and, accordingly, the number of clusters, we use the agglomeration plan (Fig. 24.4). The step is taken as the final version, after which the distance between the clusters being merged (Linkage Distance) sharply increases.

Rice. 24.4.

Let us choose the result of the partition in accordance with the recommendations from Sec. 23.7. As follows from the agglomeration plan, a relatively sharp increase in the distance between the attached clusters occurs at the 13th and 17th steps (Step in Fig. 24.4). Therefore, a choice must be made between the 12th and 16th steps. For an unambiguous choice of the final step in accordance with the same recommendations from Sec. 23.7 let's turn to multidimensional scaling.

Multidimensional scaling segmentation results

In addition, to select the final version of the classification, the picture of the relative position of points is considered by the method of multidimensional scaling in Fig. 24.5, which was obtained as a result of working with the RFP Statistica. Two dimensions are given along the axes - Dimension 1 and Dimension 2.

Clusters have a convex shape only at the 16th step of the cluster analysis, which can be seen from the results of drawing intergroup boundaries on the basis of multidimensional scaling. These results are accepted as final. Three clusters were formed, and in fact - segments. The first cluster includes nine points, the second three, and the third seven.

Rice. 24.5.

Segment characteristics

Segments can be characterized by the mean values ​​for each variable, and the segmentation results can be visually presented in the form of profiles by the mean values ​​for each variable (Fig. 24.6).

For a meaningful laconic characterization of a segment, it is given a name, a motto. The full characterization of the cluster follows from its profile. The segment name can be based on the variables for which the highest and lowest scores are obtained, as can be seen from the examination of the profiles. Comparison of profiles allows you to identify the features of each segment, "position" it against the background of the others.

Let's formulate the name of each resulting segment and give a motto. First segment - positivists: "Costs are not the main thing", the second - lovers of life. "Think about the present. We

Rice. 24.6.

here not for prestige and career ", the third - purposeful: "Prestige pays for the cost." The following technology was used to obtain the segment name.

Indeed, in accordance with Fig. 24.6:

  • for first cluster characterized by high marks for the signs (4) "Students - a wonderful period in life" and (8) "Develops thinking in the specialty." At the same time, the statements (6) "Material costs are high" and (7) "Time costs are high" received low marks;
  • second cluster - high marks for statements (1) "The ability to fully communicate and make friends" and (4) "Students - a wonderful period in life." Low marks were obtained for statements (3) "An important step in a career" and (9) "Day education is prestigious";
  • the third cluster - high scores for statements (6) "Material costs are high" and (9) "Daytime education is prestigious" while relatively low for (4) "Students are a wonderful period in life".
  • Benefits here are conveniently understood as the motives for obtaining such an education.
  • PPP is a package of applied programs.
  • The theory of the method is presented in Sec. 23.6.
  • For a more familiar profile view, it must be rotated 90 ° clockwise.

March 10th, 2015

Going out with a product to any market - consumer, industrial - a manufacturer must understand that he cannot serve all of his customers, even if there is sufficient production capacity. After all, buyers use this product in different ways, and most importantly, they buy it for different reasons. Therefore, the usual thing is to break down buyers (segmentation) according to these motives and other characteristics, and only then offer goods produced with the maximum consideration of these characteristics. The ideal approach to planning marketing activities from the point of view of meeting the needs of consumers, without exaggeration, can be considered the adaptation of products and services to the requirements of each individual consumer.

Until 1960, according to the theory and practice of business, an orientation toward an aggregated, mass market prevailed. This was due to the fact that, focusing on the common, unallocated market, the manufacturing firm was able to produce a large number of goods and obtain the effect of economies of scale. But since the 60s. The trend towards the need to distinguish the specifics of consumer demand, which is reflected in the segmentation of the sales market, began to take effect.

In modern conditions of increased competition in sales markets, the problem of the need to increase the competitiveness of domestic industrial products in the domestic and foreign markets is becoming more urgent. In these conditions, the key issue is the search for reserves to reduce costs, which is the economic basis of prices and profit. As a result, a significant number of industrial enterprises are pursuing a low-cost strategy, focusing on various ways of its implementation: refusal from expensive related services; cost savings due to the creation of cheaper models for production of products and the like. But direct costs are largely determined by the technology of production, the level of utilization of the manufacturing enterprise, and the opportunities to reduce management costs on the basis of improving the efficiency of management of the functional areas of enterprises' activities remain underutilized.

One of the modern tools is to reduce management costs and ensure improved management quality, which can be interpreted as the accuracy of forecasting profit, profitability for each cluster (a group of industrial enterprises of the same type of economic activity) in comparison with the initial situation, or the accuracy of forecasting the profitability of the functional areas of activity of these enterprises is cluster analysis.

The value of segmentation as an effective toolkit for marketing activities is explained by its following features:

ü segmentation is a highly effective means of competition, since it focuses on identifying and meeting the specific needs of consumers;
ü orients the firm's activities to a specific market niche, this is especially true for firms that start their market activities;
ü market segmentation helps to better define the marketing direction of the firm;
ü with the help of segmentation, it becomes possible to set realistic marketing goals;
ü successful market segmentation affects the effectiveness of marketing as a whole, from market and consumer research to the formation of an appropriate sales and promotion system.

In marketing theory, the concept arose S TP -marketing ... It is formed from the abbreviation of the first letters of English wordssegmenting(segmentation),targeting(target market selection) andpositioning(positioning). S TP -marketing is the heart of modern strategic marketing.

Market segmentation - This is the division of consumers into groups based on the difference in needs, characteristics or behavior and development for each of the groups of a separate marketing mix.

Market segment consists of consumers who respond in the same way to the same set of marketing incentives.

1. Market segmentation- the stage of identifying individual groups of consumers within the common market.
2. Selecting target markets- target segments are selected among the selected market segments, that is, those to which the company should focus its activities.
3. Positioning- definition of the firm's product among the products of its analogues.

The ultimate goal of segmenting the target market is the choice of a segment (or segments) of consumers, to meet the needs of which the firm's activities will be focused.
Marketers believe that the correct allocation of a market segment is half of a commercial success, and they constantly recall the modification of the well-known Pareto law (law 80:20).

Market segmentation is a formal procedure based on the application of statistical methods of multivariate analysis to research results. There are four main methods to get market segments:

1 Traditional methods:

A priori (a priori);

Cluster based.

2 New methods:

Flexible segmentation (flexible);

Component segmentation

The a priori method of segmenting the consumer market is used when it is possible to put forward a hypothesis of market segmentation. To do this, it is necessary to understand the needs, requirements, desires of consumers. Consumer characteristics such as consumption intensity, needs, key motivation elements and their values ​​will act as independent variables, and segmentation variables (age, gender, region, etc.) will be used as dependent variables.

Using this method, the researcher initially puts forward the hypothesis of market segmentation, and then in the course of marketing research tests it.

The a priori method of market segmentation includes seven stages:

1 Choice of the basis of segmentation. Analysis of needs, needs and other factors that influence consumer choice.

2 Selection of segmentation variables and development of a market segmentation grid (hypothesis). There is a selection and justification of criteria, variables for segmenting the consumer market, a search for probable relationships between the basis and variables, the contradictions in the market segmentation grid are eliminated.

3 Sampling.

4 A questionnaire is being conducted, quantitative data are collected.

5 Segments are formed based on the breakdown of the respondents from the number of potential buyers by category.

6 Establishing segment profiles. Market segments are formed and tested for compliance with the hypothesis.

7 Development of marketing strategies for each market segment.

The a priori sharding method is the most used method. This is due to its simplicity, low cost and the availability of techniques to ensure its implementation. However, in practice, situations often arise when it is rather difficult to put forward a hypothesis of market segmentation.

The cluster method is similar to the a priori method, but it does not define a dependent variable - it looks for natural clusters. First, the interviewed potential buyers are grouped into market segments using an analytical procedure. Then the variables are identified that could be used to define the market segment.

Clustering searches for natural groups, and classification - groups are formed according to artificially specified criteria.


AID grouping of consumers is widespread. When using this method, the choice of the system-forming criterion is carried out. After that, the sample is divided into subgroups, that is, subgroups with a high value of the system-forming criterion are formed.

The disadvantage of this method is the selection of a market segment. The method is laborious and does not guarantee an accurate solution.

Segmentation by the method of cluster analysis is carried out in ascending order (bottom-up). At the stage of marketing research, many characteristics of the buyer are highlighted. A sample of at least 200 units is required. The results are processed. The data are considered on a universal scale that determines the severity of the parameter. Then each consumer is examined and the most similar ones are determined. Similar consumers are combined into clusters and act as a composite object. Next, the most similar objects are searched for and combined into a new cluster. The process ends when it is impossible to identify similar clusters.

To implement market segmentation using the clustering method, in practice, statistical packages such as SPSS and NCSS & PASS can be used.

Flexible market segmentation is a dynamic procedure that implies flexibility in building segments based on an analysis of consumer preferences in relation to product alternatives. The collaborative analysis procedure is at the heart of flexible sharding. One of the advantages of this method is that it allows you to accurately determine the groups of consumers when a new product enters the market. The disadvantages of the flexible segmentation method include high cost, complex implementation procedure and possible errors at the developer level.

Component analysis of market segmentation is based on sophisticated statistical analysis techniques. It requires a lot of computing resources. The method of component analysis of market segmentation was proposed by P. Green. This method tries to determine which type of buyers is most suitable for certain characteristics of the product.

According to Western experts, the method of flexible and component segmentation of the market is purely academic and inapplicable to real life.

As part of the work on the first chapter of the final qualifying work, theoretical knowledge was obtained in the field of segmenting the consumer market. The main features of the segmentation of the consumer market are considered. Methods of market segmentation have been studied.

Segmenting methods

There are some "basic" segmentation methods. The most important of these is consumer cluster analysis (taxonomy). Consumer clusters are formed by joining a group of those who give similar answers to the questions asked. Customers can be clustered if they are similar in age, income, habits, etc. Similarity between buyers is based on different measures, but often the weighted sum of the squared difference between buyers' responses to a question is used as a measure of similarity. The output of clustering algorithms can be hierarchical trees or the association of consumers into groups. There are a large number of cluster algorithms.

For example, in the United States, there is a widespread cluster analysis of systems called PRIZM. , which begins clustering by reducing the set of 1000 possible socio-demographic indicators. This system forms socio-demographic segments for the entire territory of the United States. Thus, cluster 28 is highlighted - families that fall into this cluster include people with the most successful professional or managerial careers. This cluster also reflects high income, education, property, approximately middle age. Although this cluster represents only 7% of the US population, it is critical for entrepreneurs selling expensive goods.


There are other examples of customer segmentation based on cluster analysis. For example, among the "psychological" sectors, a very important place is occupied by the "consumer's attitude to the novelty of the product" (Fig. 3)

Figure 3

As can be seen from the above data, the largest number of consumers are ordinary buyers.

Customer segmentation based on cluster analysis is a "classic" method. At the same time, there are techniques for segmenting the market based on the so-called "product segmentation" or market segmentation by product parameters. It is especially important in the production and marketing of new products. Of particular importance is the segmentation by product, based on the study of long-term trends in the market. The process of development and production of a new product, completion of large investment programs require a rather long period, and the correctness of the results of market analysis, assessment of its capacity is especially important here. In the conditions of work on the traditional market of standard products, the calculation of its capacity can be carried out by using the method of summing markets. In modern conditions, in order to increase its competitiveness and correctly determine the market capacity, it is no longer enough for an enterprise to segment the market in only one direction - the definition of consumer groups according to some criteria. Within the framework of integrated marketing, it is also necessary to segment the product itself according to the parameters most important for its promotion on the market. For this purpose, the method of compiling functional maps- carrying out a kind of double segmentation, by product and by consumer.

Functional maps "can be one-factor (segmentation is carried out according to one factor and for a homogeneous group of products) and multi-factor (analysis of which groups of consumers a particular product model is intended for and which of its parameters are most important for promoting products on the market). functional maps, you can determine which market segment this product is designed for, which of its functional parameters correspond to certain consumer requests.

When developing new products, this methodology assumes that all factors reflecting the system of consumer preferences should be taken into account, and at the same time the technical parameters of a new product, with the help of which it is possible to satisfy the needs of the consumer; consumer groups are defined, each with its own set of requests and preferences; all selected factors are ranked according to the degree of importance for each of the consumer groups.

This approach allows already at the development stage to see what parameters of the product need to be redesigned, or to determine whether there is a sufficiently capacious market for this model.

Let's give an example of such a market analysis in relation to the developed project of computers "Apple" (Table 1) (see next page)

Table 1." Segmentation of the personal computer market and factors taken into account in the development of products for it (1982) "

Factors Market segments by consumer groups Model
Houses At school At the university Into the house. cabinet In small business In corporation A V
Technical specifications * * *** ** ** ** *** **
Price *** *** ** *** *** ** 0 **
Special qualities * * ** * * * ** *
Reliability ** * * ** ** * 0 **
Convenience in use ** ** * ** * 0 *** ***
Compatibility 0 0 0 0 0 *** 0 0
Peripheral equipment 0 0 0 0 0 *** 0 0
Prog-e provision * * ** ** ** *** * **

*** - a very important factor

** - an important factor

* - an unimportant factor

0 - negligible factor

This simple analysis shows that Model A is a computer without a market, and Model B is the most suitable product for universities and small businesses.

The company put it on computer A and lost.

In general, in world practice, 2 fundamental approaches to marketing segmentation are used - (see: general scheme of segment analysis (Fig. 4)) (next page)



Within the first method. called "a priori" previously known signs of segmentation, the number of segments, their number, characteristics, map of interests. That is, it is assumed that the segment groups in this method have already been formed. The "ariory" method is used in cases where segmentation is not part of the current research, but serves as an auxiliary basis for solving other marketing problems. Sometimes this method is used when there is a very clear definition of market segments, when the variance of market segments is not high. "A priory" is also admissible when forming a new product oriented to a well-known market segment.

Within the framework of the second method, called "post hoc (cluster based), it is implied the uncertainty of the segmentation characteristics and the essence of the segments themselves. The researcher preselects a number of variables that are interactive with respect to the respondent (the method implies a survey) and then, depending on the expressed attitude towards a certain group of variables, the respondents belong to the relevant segment, while the map of interests identified during the subsequent analysis is considered as secondary This method is used when segmenting consumer markets, the segment structure of which is not defined in relation to the product being sold.

Segmentation by method " a priory "

When choosing the number of segments into which the market should be divided, they are usually guided by the target function - determining the most promising segment. Obviously, when forming a sample, it is unnecessary to include in it segments, whose purchasing potential is rather small in relation to the investigated product. The number of segments, as studies show, should not exceed 10, the excess is usually associated with excessive detailing of segmentation features and leads to unnecessary "blurring" of features.

For example, when segmentation by income level, it is recommended to break down all potential buyers into equal segments, taking into account that the volume of each of the segments is at least not less than the estimated volume of sales of services based on knowledge of the production capacity of the enterprise. The most successful example that explains the above and demonstrates the possibility of breaking down potential consumers into stable segment groups is the segmentation of the population by income, when the entire population is divided into five 20% groups. The presented distribution of the volume of income for five 20% groups of the population is given regularly in statistical compilations and summaries, similar to that presented in table. 2

table 2 ."Distribution of income by population groups. %"

The convenience of working with such segment groups is obvious, especially in terms of tracking their capacity.

 

It might be useful to read: