Data analysis and forecasting. Data analysis and forecasting Data analysis and forecasting in 1s enterprise

Start and end of business processes

The life cycle of a business process begins with its start. You can define an event handler for the Before Start event for this waypoint. This procedure has two parameters. The first parameter is the route point from which the handler was called (a business process can have several start points), the second parameter is Rejection. If the value True is written to the Rejection variable, the business process will not be started. In the Before start event handler, you can check the conditions necessary to start the business process, create "accompanying" objects, references to which must be stored in the business process itself. When defining a handler for this event, it is not recommended to implement mechanisms that organize a dialogue with the user (opening various dialog forms).

The start of a business process can be done in different ways:

program start of the business process (from the code in the built-in language);

interactive start (clicking on the OK button of the business process form);

starting a business process as a nested one.

Using the mechanism of data analysis and forecasting in 1C

The mechanism of data analysis and forecasting makes it possible to implement various tools in applied solutions to identify patterns that are usually hidden behind large amounts of information.

The mechanism allows you to work both with data received from an infobase and with data received from another source, previously loaded into a table of values ​​or a spreadsheet document. Applying one of the types of analysis to the source data, you can get the result of the analysis. The result of the analysis is a certain model of data behavior. The result of the analysis can be displayed in the final document or saved for further use.

Further use of the result of the analysis is that on its basis a forecast model can be created that allows predicting the behavior of new data in accordance with the existing model. For example, you can analyze which items are purchased together (on the same invoice) and store the forecast model generated from this analysis in the database.

Using text document layouts

Text document 1C: Enterprise allows you to present various information in the form of texts. A text document can be read from a text file, saved to a text file. It can be placed in a form or in a layout, and it can be handled using the built-in language. By and large, a text document allows you to perform three logical groups of actions: - reading from disk and writing text files to disk; - work with individual lines of a text document: getting, adding, deleting, replacing; - creating a text layout and using it to form the resulting text document.

In addition to the direct formation of the content of a text document, there is the possibility of filling text documents based on layouts. A text document layout describes the invariant parts of a text document that contain the layout and fields in which data can be added. The process of filling a text document based on a layout consists in reading certain areas of the layout, cyclically filling them with data, and sequentially outputting the resulting parts of the document to the resulting text document.

Text document layout format. A text document layout is a text document that uses service lines starting with the "#" character. The control character is followed by keywords that describe certain elements of the layout.

Also, in the layout of a text document, the service symbols “[” and “]” are used, which determine the location of the variable layout fields.

The entire layout of a text document consists of regions. One area combines several consecutive lines. Areas must follow each other and cannot overlap or be included in each other. To describe an area, the keywords Area and EndRegion are used. The realm keyword is followed by the name of the realm.

The mechanism is represented by a set of objects of the 1C:Enterprise built-in language. The scheme of interaction of the main objects of the mechanism is shown in the figure. Data analysis columns settings – a set of settings for data analysis input columns. For each column, the type of data contained in it, the role performed by the column, additional settings depending on the type of analysis being performed. Data analysis parameters - a set of parameters for the data analysis to be performed. The composition of the parameters depends on the type of analysis. For example, for cluster analysis indicates the number of clusters into which it is necessary to divide the original objects, the type of distance measurement between objects, etc. Raw data is the source of data for analysis. The result of a query, the cell area of ​​a spreadsheet document, a table of values ​​can act as a data source. An analyzer is an object that directly performs data analysis. The data source is set for the object, parameters are set. The result of the operation of this object is the result of data analysis, the type of which depends on the type of analysis. The result of data analysis is a special object containing information about the result of the analysis. Each type of analysis has its own result. For example, the result of data analysis - a decision tree will be an object of type DataAnalysisResultDecisionTree. In the future, the result can be displayed in a spreadsheet document using the data analysis report builder, can be displayed through programmatic access to its contents, can be used to create a forecast model. Any result of data analysis can be saved for later use. A forecast model is a special object that allows you to make a forecast based on input data. The model type depends on the type of data analysis. For example, a model created for data analysis - association search will be of type AssociationSearch PredictionModel. The data source for the forecast is passed to the input of the forecast model. The result is a table of values ​​containing the predicted values. Selection for the forecast - a table of values, the result of a query, or an area of ​​a spreadsheet document containing information on which it is necessary to build a forecast. For example, for the forecast model - search for associations, the selection may contain a list of products of the sales document. The result of the work of the model can recommend what products can still be offered to the buyer. Sample columns setup – a set of special objects that show the correspondence between the columns of the forecast model and the columns of the forecast sample. Result column settings - allows you to control which columns will be placed in the resulting table of the forecast model. The output of the model is a table of values, consisting of columns, as specified in the settings of the resulting columns, and containing the predicted data. The specific content is determined by the analysis type. The data analysis report builder is an object that allows you to display a report on the result of data analysis. In addition, the report builder provides special objects for linking to data in order to allow the user to interactively control analysis parameters, data source column settings, forecast model column settings, and so on. Types of analysis The mechanism allows you to perform the following types of analysis: The data analysis mechanism in 1C 8.2 and 8.3 simplifies the work of the developer in terms of identifying patterns based on various data. For example, using this mechanism, you can display products that are most often bought together. Another example is building a sales forecast based on past data. This is not the whole range of application of the data analysis mechanism in 1C, let's delve into its capabilities in more detail. The main objects of the data analysis mechanism in 1C This mechanism is represented in the 1C Enterprise system by 3 system objects:
  • Data analysis is an object that performs data analysis. For it, you must specify the data source and the necessary parameters for analysis.
  • The result of data analysis is an object that is the result of the work of data analysis.
  • A forecast model is created based on the result of data analysis. The object is the final link in the 1C analysis mechanism and generates a table of values ​​that contains the predicted values.
Types of data analysis 1C 8.3 System 1C Enterprise can use different types analysis, let's look at them in more detail.
  1. General statistics − This type of analysis is a simple statistical sampling from a data source. An example of application is the analysis of sales by item for a period. The result of the analysis will be information about how much a particular product was sold. The system will also calculate specific fields - maximum, minimum, median, mean, range, standard deviation, number of values, number of unique values, mode.
  2. Association search - the type of analysis is designed to search for combinations that often occur together. Very good for finding frequently purchased items together. As a result of the analysis, the system will generate the following information: information about the processed data, associative groups, association rules by which the groups are compared.
  3. Search for sequences - analysis that allows you to identify patterns in the analyzed data and offer a further forecast. As a result of the analysis, the system will display information about the possibility of certain events occurring in percentage terms.

One of the main trends in the accounting and management systems market is the constant increase in demand for the use of analytical data processing tools that ensure informed decision making. That is why one of the strategic directions for the development of the 1C:Enterprise software system has become the constant expansion of the capabilities of economic and analytical reporting. However, today's customers are no longer satisfied with traditional tools that allow them to generate a variety of reports, pivot tables and charts that are created on the basis of predefined indicators and relationships and that need to be analyzed manually. Enterprises increasingly need qualitatively different tools to automatically search for non-obvious rules and identify unknown patterns (Fig. 1). This is how you can generate qualitatively new knowledge based on the information accumulated by the company and sometimes make completely non-trivial decisions to improve business efficiency using data mining methods (DIA).
Rice. 1. The logic of the development of "intellectuality" of the analytical problems being solved. Summer 2003 release new version technological platform "1C:Enterprise 8.0" allowed to significantly expand the capabilities of business intelligence in the system (see sidebar). However, one important remark must be made here. Platform software "1C" develops not only in "steps", from version to version, but is constantly improved and expanded within one version, and in two directions - technological and applied. So, after the first announcement of the G8, more than a dozen releases of the platform have already been released, the latest version (as of January 2006) has the number 8.0.13, and it is quite different from what it was two and a half years ago! One of the directions of development of "1C:Enterprise 8.0" is precisely the mechanisms of business intelligence; in particular, IAD tools appeared in it only in 2005. It is important to note that most of the analysis functions are implemented at the level of the technological platform and become available to users only after they are included in new releases of applied solutions. Thus, there is some gap (sometimes several months) between the emergence of new features and their provision to users. Keeping this problem in mind, in September 2005, 1C released a special application solution "Data Analysis Subsystem" (DAS), which can be built into any configuration of the "1C:Enterprise 8.0" platform, to narrow the gap. In addition to a wide range basic functions, the package includes more than 30 pre-configured models for a typical Trade Management configuration. PAD includes those qualitatively new IAD tools that were previously absent in 1C programs. For direct analysis and forecasting of these specific skills and knowledge is not required. A good command of the analyzed subject area and an understanding of the main cause-and-effect relationships in it are assumed. Preparing data sources and predictive models requires the ability to use a query builder and knowledge of how to place information in configuration metadata objects. IAD algorithms included in the new configuration (version 1.0.5) form analytical models (templates) that describe patterns in the source data. These models are of independent value (can be reused), and are also used for automated generation of forecasts, including scenario ones, with previously unknown indicators (Fig. 2). The IAD mechanism is a set of objects of the built-in language interacting with each other, thanks to which the developer can use its constituent parts in an arbitrary combination in any applied solution. Built-in objects make it easy to organize interactive configuration of analysis parameters by the user, as well as display the analysis result in a form convenient for displaying in a spreadsheet document. Applying one of the types of analysis to the source data, you can get a result that will be a certain model of data behavior. The result of the analysis can be displayed in the final document or saved for further use - based on it, you can create a forecast model that allows you to predict the behavior of new data.
Rice. 2. General scheme functioning of the data mining mechanism. The current version of the subsystem implements methods that have received the greatest commercial distribution in world practice, namely:

  • clustering - implements grouping of objects, maximizing intra-group similarity and inter-group differences;
  • decision tree - provides the construction of a cause-and-effect hierarchy of conditions leading to certain decisions;
  • association search - searches for stable combinations of elements in events or objects.
Below we will take a closer look at the essence and possibilities practical application these IAD methods.

Clustering

The purpose of clustering is the selection of a certain number of relatively homogeneous groups (segments or clusters) from a set of objects of the same nature. The objects are distributed into groups in such a way that intragroup differences are minimal, and intergroup differences are maximum (Fig. 3). Clustering methods make it possible to move from object-by-object to group representation of a set of arbitrary objects, which greatly simplifies their handling. Several possible scenarios for applying clustering in practice are described below. Customer segmentation according to a certain set of parameters, it makes it possible to distinguish among them stable groups with similar purchasing preferences, sales levels and solvency, which greatly simplifies customer relationship management. At goods classification rather conditional principles of classification are often used. The selection of segments based on a group of formal criteria makes it possible to identify truly homogeneous groups of goods. In the context of a wide and rather heterogeneous range of goods, assortment management at the segment level, compared to management at the level of the range, significantly increases the efficiency of promotion, pricing, merchandising, and supply chain management. Manager segmentation allows you to more effectively plan organizational changes, improve motivational schemes, adjust the requirements for hired personnel, which ultimately improves the manageability of the company and the stability of the business as a whole.
Rice. 3. Data analysis by clustering. The similarity and difference between objects is determined by the "distance" between them in the space of factors. The method of measuring the distance depends on the metric, which indicates the principle of determining the similarity/difference between the objects of the sample. The current implementation supports the following metrics:
  • "Euclidean metric" is the standard distance between two points in an N-dimensional Euclidean attribute space;
  • "Euclidean squared metric" - enhances the effect of difference (distance) on the result of clustering;
  • "city metric" - reduces the impact of emissions;
  • "dominance metric" - defines the difference between the objects of the sample as the maximum existing difference between the values ​​of their attributes, therefore it is useful for enhancing the differences between objects by one attribute.
The method of forming clusters based on information about the distance between clustered objects is determined by the clustering method. The current version of 1C:Enterprise 8.0 implements the following clustering methods:
  • "close communication" - the object joins the group for which the distance to the nearest object is minimal;
  • "distant communication" - the object joins the group for which the distance to the farthest object is minimal;
  • "center of gravity" - the object joins the group for which the distance to the center of the cluster is minimal;
  • "k-means" method - arbitrary objects are selected, which are considered to be the centers of clusters, then all the analyzed objects are sequentially sorted out and attached to the cluster closest to them. After attaching the object, it is calculated new center cluster, which is calculated as the average value of the attributes of all objects included in the cluster. The procedure is repeated until the cluster centers change.
Any of the clustering methods implemented in the platform requires an explicit indication of the number of desired clusters. You can enter weights for feature attributes, allowing you to prioritize them. As a result of analysis using clustering, the following data are obtained:
  • cluster centers, which are a set of average values ​​of the input columns in each cluster;
  • a table of intercluster distances (distances between cluster centers) that determine the degree of difference between them;
  • predictive column values ​​for each cluster;
  • a rating of factors and a tree of conditions that determined the distribution of objects into clusters.
Clustering algorithms make it possible not only to carry out a cluster analysis of objects on a set of given attributes, but also to predict the value of one or more of them for the current sample based on the assignment of the objects of this sample to a particular cluster.

Association search

This method is designed to identify stable combinations of elements in certain events or objects. The analysis results are presented as groups of associated elements. Here, in addition to the identified stable combinations of elements, detailed analytics on associated elements is provided (Fig. 4).
Rice. 4. Presentation of the results of the analysis by the method of "search for associations" in the form of groups of associated elements. The method was originally developed to look for typical combinations of items in purchases, which is why it is sometimes referred to as shopping cart analysis. In relation to this scenario, as a rule, product groups or individual products act as associated elements. And a grouping object that combines elements of samples can be any object of an information system that identifies a transaction: for example, a buyer's order, an act on the provision of services, or a cash receipt. Information about patterns in customer product preferences improves the efficiency of customer relationship management (in terms of advertising campaigns and marketing campaigns), pricing (formation of complex offers and discount systems), inventory management and merchandising (distribution of goods in trading floors). Another example of using this method is to determine the combinations of advertising channels preferred by customers in order to eliminate their duplication when conducting targeted advertising campaigns. This allows you to significantly reduce the costs of such events. The association search algorithm implemented in the platform has quite flexible controls for the adequacy of analysis or forecast models. The "Minimum percentage of cases" parameter determines the "threshold" of the algorithm for a particular combination of elements in an event or object, which allows you to ignore weakly common associations. The "Minimum Confidence" parameter determines the required stability of the associations you are looking for, and the "Minimum Significance" parameter allows you to identify the highest priority associations. The "Rules cutting type" parameter, which can take the values ​​"Cut off redundant" and "Cut off those covered by other rules", greatly facilitates the perception of the analysis and forecast results. For the practical interpretation of the results obtained using this algorithm, it is critically important to partition the initial set of associated elements into really homogeneous groups from the point of view of the analysis being carried out.

decision tree

As a result of applying this method to the initial data, a hierarchical (tree-like) structure of rules of the form "if ... then ..." is created, and the analysis algorithm ensures the isolation of the most significant conditions and transitions between them at each stage. This algorithm is most widely used in identifying cause-and-effect relationships in data and describing behavioral patterns. A typical area of ​​application of decision trees is the assessment of various risks, for example, the closure of an order by a client or its transfer to a competitor, late delivery of goods by a supplier, or delay in payment of a trade credit (Fig. 5). The typical input factors of the model are the amount and composition of the order, the current balance of mutual settlements, credit limit, prepayment percentage, terms of delivery, and other parameters that characterize the object of the forecast. Adequate risk assessment ensures that informed decisions are made to optimize the return/risk ratio in the company's operations, and is also useful for increasing the realism of various budgets.

Rice. 5. Application of the "decision tree" method allows, based on the input factors of the model (a), to obtain an assessment of the risks of making certain management decisions(b). As an example illustrating the ability of the algorithm to identify causal relationships, we can cite the problem of optimizing the work of the sales department. To solve it, as a predictable value, we will choose an indicator of the effectiveness of sales managers, for example, specific profitability per client, and as factors - a set of data potentially affecting the result. The algorithm will determine the factors that have the greatest impact on the result, as well as typical combinations of conditions leading to a particular result. Moreover, the "Data Analysis" subsystem will allow estimating (predicting) the expected values target based on current data, as well as make a forecast "what if ..." by changing the indicators supplied to the input of the model. The results of analysis and forecasting using decision trees can significantly reduce the impact of the uncertainty of the business environment on the state of the company, as well as solve a wide range of tasks related to identifying complex and non-obvious cause-and-effect relationships. The Decision Tree algorithm forms a causal hierarchy of conditions leading to certain decisions. As a result of applying this method to the training sample, a hierarchical (tree-like) structure of splitting rules of the form "if ... then ..." is created. The analysis algorithm (model learning) is reduced to an iterative process of isolating the most significant conditions and transitions between them. Conditions can be both quantitative and qualitative and form the "branches" of this abstract tree. Its "leaf" is formed by the values ​​of the predicted attribute (decision), which, like the transition conditions, allow both qualitative and quantitative interpretation. The totality of these conditions imposed on the factors and the structure of transitions between them to the final solution form the forecast model. This algorithm is most widely used in evaluating the outcomes of various event chains and identifying cause-and-effect relationships in samples. The significance and reliability of the model of this algorithm is controlled using the parameters "Simplification type", "Maximum tree depth" and "Minimum number of elements in a node". The results of the analysis of the sample using the "Decision Tree" algorithm are:

  • rating of factors, which is a list of factors that influenced the decision, sorted in descending order of importance ("citations" in the nodes of the tree);
  • comparison of decisions (values ​​of the predictive column) and the conditions that determined them, in other words, the tree "Consequence-Cause";
  • "Cause-Effect" tree, which is a set of transitions between conditions that determines a particular decision (essentially, a visual representation of the forecast model).
Joint solutions "1C"

In addition to the functions implemented directly within the framework of the 1C:Enterprise 8.0 platform, the arsenal of 1C business intelligence tools is replenished with specialized solutions created, among other things, within the framework of the 1C-Joint project (http://v8.1c.ru/ solutions) - with the participation of partners of the company and independent developers (see "Joint solutions of the company "1C" and its partners", "BYTE / Russia" No. 9 "2005). Here we note two products related to the use of intelligent methods of analysis, - is "1C:Enterprise 8.0. 1C-VIP Anatech: ABIS. ABC. Management accounting and costing" (partner-developer - consulting company"VIP Anatekh") and "1C-VIP Anatekh-VDGB: ABIS. BSC. Balanced Scorecard" (developer partners - companies "VIP Anatekh" and VDGB).

Typical business scenarios for using IAD methods

The PAD documentation contains a section devoted to typical examples of the use of data mining in relation to the "1C: Trade Management 8.0." configuration. Here we present several such business scenarios.

Customer Relationship Management

Scenario "Planning advertising campaign" Planning an upcoming advertising campaign is considered from the point of view of optimizing the allocation of the allocated budget for advertising channels, based on regional, product, client and other indicators of the target segment, as well as the effectiveness of advertising channels in the specified sections in some previous planning period. Algorithm- "Cluster analysis". Predictive Attributes- shares of responses to the advertising channel of conditionally homogeneous segments identified by the algorithm. Calculated columns- shares of advertising channels in the budget of an advertising campaign, taking into account the likely share of responses and effectiveness (in terms of resulting revenue) of each advertising channel. Pattern example: class A customers in region P, who prefer product group P, are attracted by the same advertising channel as customers in region H, who prefer product group Y.

Supply chain management

Scenario "Optimizing the selection of suppliers by product group" The choice of dominant "first tier" suppliers for key product groups is extremely important to stabilize the logistics system in particular and the overall supply chain management system in general, as well as to reduce the average duration of supply chains. On the other hand, closer integration with major suppliers, as a rule, can significantly reduce the cost of goods. In this regard, it is of interest to analyze stable combinations of suppliers in various product groups in comparison with analytics for suppliers associated within the groups. This allows you to identify "intersections" of suppliers in different product groups and optimize relationships with them. Algorithm- "Search for associations". Predictive Attributes- stable combinations of suppliers. Main Factors- commodity groups. Decryption- analytics by suppliers (volume of purchases, revenue, terms of delivery and payment, terms of order fulfillment - pessimistic, optimistic, average). Pattern example: a stable association of a large and unpredictable supplier A and a predictable average supplier B in a large number of product groups. It is possible, when placing orders for competitive product groups, to position the medium supplier as the main one, if the volume of the order to the large one does not exceed a certain threshold (which gives a significant gain in scale).

Personnel Management

Scenario "Profiling sales managers by key performance indicators" Determining the effectiveness of managers (retention, customer search, communication efficiency, collection of conditional and unconditional receivables, specific performance indicators per client, etc.) is of interest not only from the point of view of forming a system of financial incentives for managers, but also from the point of view of effective rationing parameters of their activity. Algorithm- Decision Trees. Predictive Attributes - key indicators efficiency of the sales department (number key clients, churn and acquisition rates, lost revenue per month, revenue raised per month, revenue per month per client, total receipts from clients, etc.). Main Factors- the number of active clients, revenue, income, specific indicators per client, communication efficiency. Depending on the predictive attributes, the composition of factors can vary significantly. Pattern example: managers providing the best collection rates accounts receivable(the ratio of DS receipts to revenue), have a retention rate > 0.8; attraction rate > 0.25; the number of simultaneously open transactions is not more than 15, but not less than 10; the intensity of events per day is not more than 10, but not less than 3; the number of active clients in the period is at least 50, but not more than 100.

Conclusion

Modern business is so multifaceted that the factors potentially influencing a particular decision can be in the tens. The competition is getting stronger day by day life cycle goods is shortened, customer preferences are changing faster and faster. To develop business, it is necessary to respond as dynamically as possible to the rapidly changing business environment, taking into account the subtle, and sometimes subtle patterns of events. Which customer groups will respond to promotion, and which ones will irrevocably go to competitors? Open a new business line or hold off for now? Will the buyer delay the payment, and the supplier - the shipment? What are the opportunities for growth and where are the potential threats lurking? These are the questions that thousands of managers ask themselves and their colleagues every day. The data analysis subsystem implemented in the 1C:Enterprise 8.0 platform is designed to help users of the corporate information system find answers to non-trivial questions faster by providing automated transformation of data accumulated in information system, into practical and well-interpreted regularities.

Economic and analytical reporting in "1C:Enterprise 8.0"

The 1C:Enterprise 8.0 platform includes a number of mechanisms for generating economic and analytical reporting that allow you to generate interactive documents (and not just printing forms) within the framework of certain applied solutions. Thus, the user can work with reports in the same way as with any screen form, including changing report parameters, rebuilding it, using "decoding" (obtaining additional reports based on individual elements of an already generated report), etc. , there are several universal software tools, allowing you to generate any arbitrary reports, depending on the tasks. This can be done, among other things, by the users themselves (quite experienced), who are well acquainted with the structure of the applied solution used. Below we will briefly review the main reporting tools in 1C:Enterprise 8.0. Requests- this is one of the ways to access data in 1C:Enterprise 8.0, which selects information from the database according to certain conditions, usually in combination with the simplest processing of the received data: group, sort, calculate. Changing the data using queries is not possible, since they are originally designed to quickly obtain information from large amounts of information. The database is implemented as a set of interconnected tables that can be accessed both individually and several tables in interconnection. To implement their own algorithms, a developer can use a query language based on SQL and containing many extensions that reflect the specifics of financial and economic tasks and reduce the effort spent on creating applied solutions. The platform includes a query builder that allows you to compose the correct query text using only visual means (Fig. 6).

Rice. 6. The query builder (a) allows the developer to compose the query text (b) purely by visual means. spreadsheet document is a powerful mechanism for visualizing and editing information, including using dynamic reading of information from the database. A spreadsheet document can be used by itself or be part of any of the forms used in the applied solution. At its core, it resembles spreadsheets(consists of rows and columns in which data is placed), but its capabilities are much wider. It supports grouping, decryption, inclusion of notes. The document can use different kinds report design, including graphical charts. A spreadsheet document can contain pivot tables, which themselves serve as effective tool programmatic and interactive representation of multidimensional data. Output form constructor helps the developer to create reports and present report data in a convenient tabular or graphical form. It includes all the features of the query builder, as well as creating and customizing a form. Report Builder- this is an object of the built-in language, which provides the ability to dynamically create a report both programmatically and interactively (Fig. 7). At the heart of its work is a request, by which the user is given the opportunity to interactively configure all the main parameters contained in the text of the request. The results of this query are displayed in a spreadsheet document, which can also use information from arbitrary data sources. The developer, using the commands of the report builder, can change the set of parameters available to the user for configuration.
Rice. 7. Scheme of work of the report builder. Geographic schemes allow you to visualize information that has a territorial reference: to countries, regions, cities. Data can be displayed different ways: in the form of text, histogram, color, picture, circles of various diameters and colors, pie charts. This allows you to display, for example, sales volumes by region in a graphical form. The user can change the scale of the displayed diagram, get transcripts by clicking on diagram objects, and even create new geographical diagrams. A geographic map can also be used simply to display certain geographic data, such as directions to an office or a vehicle route. Data mining. These mechanisms make it possible to identify non-obvious patterns that are usually hidden behind large amounts of information. It uses complementary knowledge discovery methods that have received the greatest commercial distribution in world practice: clustering (grouping relatively similar objects), association search (search for stable combinations of events and objects) and decision tree (building a causal hierarchy of conditions leading to certain decisions). Query Console and Reporting Console. Both of these consoles are not part of the technology platform, but are external reports that can be run in any application solution. They help a developer or an experienced user, respectively, to compose the text of a query and analyze its results or draw up an arbitrary report.

Data Analysis and Prediction Engine- this is one of the mechanisms for the formation of economic and analytical reporting. It provides users (economists, analysts, etc.) with the ability to search for non-obvious patterns in the data accumulated in the information base. This mechanism allows:

  • search for patterns in the initial data of the infobase;
  • manage the parameters of the analysis being performed both programmatically and interactively;
  • provide programmatic access to the analysis result;
  • automatically display the analysis result in a spreadsheet document;
  • create forecast models that allow you to automatically predict subsequent events or the values ​​of certain characteristics of new objects.

The data analysis mechanism is a set of objects of the built-in language interacting with each other, which allows the developer to use its constituent parts in any combination in any application solution. Built-in objects make it easy to organize interactive configuration of analysis parameters by the user, and also allow you to display the analysis result in a form that is convenient for displaying in a spreadsheet document.

The mechanism allows you to work both with data received from the infobase and with data received from an external source, previously loaded into a table of values ​​or a spreadsheet document:

Applying one of the types of analysis to the source data, you can get the result of the analysis. The result of the analysis is a certain model of data behavior. The result of the analysis can be displayed in the final document, or saved for further use.

Further use of the result of the analysis is that on its basis a forecast model can be created that allows predicting the behavior of new data in accordance with the existing model.

For example, you can analyze which items are purchased together (on the same invoice) and store this analysis result in a database. In the future, when creating the next invoice, based on the saved analysis result, you can build a forecast model, submit to it the new data contained in this invoice as input, and receive a forecast at the output - a list of goods that counterparty Petrov B.S. is also likely to acquire if they are offered to him:

Several types of data analysis are implemented in the data analysis and forecasting engine:

Implemented Analysis Types

general Statistics

It is a mechanism for collecting information about the data in the study sample. This type of analysis is intended for a preliminary study of the analyzed data source.

The analysis shows a number of characteristics of continuous and discrete fields. Continuous fields contain types such as Number, date of. For other types, discrete fields are used. When a report is output to a spreadsheet document, pie charts are filled in to display the composition of the fields.

Association search

This type of analysis searches for frequently occurring groups of objects or characteristic values, and also searches for association rules. Association search can be used, for example, to identify frequently bought together goods or services:

This type of analysis can work with hierarchical data, which allows, for example, to find rules not only for specific products, but also for their groups. An important feature of this type of analysis is the ability to work both with an object data source, in which each column contains some characteristic of the object, and with an event source, where the object's characteristics are located in one column.

To facilitate the perception of the result, a mechanism for cutting off redundant rules is provided.

Sequence Search

The analysis type search for sequences allows you to identify sequential chains of events in the data source. For example, it could be a chain of goods or services that customers often purchase in succession:

This type of analysis allows you to search through the hierarchy, which makes it possible to track not only the sequences of specific events, but also the sequences of parent groups.

A set of analysis parameters allows the specialist to limit the time distances between the elements of the desired sequences, as well as to adjust the accuracy of the results.

cluster analysis

Cluster analysis allows you to divide the initial set of objects under study into groups of objects, so that each object is more similar to the objects from its group than to the objects of other groups. Analyzing in the future the resulting groups, called clusters, you can determine what characterizes a particular group, make a decision about the methods of working with objects of different groups. For example, using cluster analysis, you can divide the clients with whom the company works into groups in order to apply different strategies when working with them:

Using the cluster analysis parameters, the analyst can set up the algorithm by which the partition will be performed, and can also dynamically change the composition of the characteristics taken into account in the analysis, set up weighting factors for them.

The result of clustering can be displayed in a dendrogram - a special object designed to display sequential relationships between objects.

decision tree

The decision tree analysis type allows you to build a hierarchical structure of classifying rules, presented in the form of a tree.

To build a decision tree, you need to select the target attribute that will be used to build the classifier and a number of input attributes that will be used to create rules. The target attribute may contain, for example, information about whether the client switched to another service provider, whether the transaction was successful, whether the work was done well, etc. Input attributes, for example, can be the age of an employee, his work experience, the financial condition of the client, the number of employees in the company, etc.

The result of the analysis is represented as a tree, each node of which contains a certain condition. To decide which class a certain new object should be assigned to, it is necessary, answering the questions at the nodes, to go through the chain from the root to the leaf of the tree, passing to the child nodes in the case of an affirmative answer and to the neighboring node in the case of a negative one.

A set of analysis parameters allows you to adjust the accuracy of the resulting tree:

Forecast Models

The forecast models created by the engine are special objects that are created from the result of data analysis and allow you to automatically perform a forecast for new data in the future.

For example, an association search prediction model built in the analysis of customer purchases can be used when working with a purchasing customer in order to offer him products that he will purchase with a certain degree of probability along with the products he has chosen.

The mechanism of data analysis and forecasting provides users (economists, analysts, etc.) with the ability to search for non-obvious patterns in the data accumulated in the information base. This mechanism allows:

  • search for patterns in the initial data of the infobase;
  • manage the parameters of the analysis being performed both programmatically and interactively;
  • provide programmatic access to the analysis result;
  • automatically display the analysis result in a spreadsheet document;
  • create forecast models that allow you to automatically predict subsequent events or the values ​​of certain characteristics of new objects.

The data analysis mechanism is a set of objects of the built-in language interacting with each other, which allows the developer to use its constituent parts in any combination in any application solution. Built-in objects make it easy to organize interactive configuration of analysis parameters by the user, and also allow you to display the analysis result in a form that is convenient for displaying in a spreadsheet document.

The mechanism allows you to work both with data received from the infobase and with data received from an external source, previously loaded into a table of values ​​or a spreadsheet document:

Applying one of the types of analysis to the source data, you can get the result of the analysis. The result of the analysis is a certain model of data behavior. The result of the analysis can be displayed in the final document, or saved for further use.

Further use of the result of the analysis is that on its basis a forecast model can be created that allows predicting the behavior of new data in accordance with the existing model.

For example, you can analyze which items are purchased together (on the same invoice) and store this analysis result in a database. In the future, when creating another invoice:

Based on the saved analysis result, it is possible to build a forecast model, feed it "at the input" the new data contained in this invoice, and "at the output" receive a forecast - a list of goods that counterparty Petrov B.S. is also likely to acquire if they are offered to him:

Several types of data analysis are implemented in the data analysis and forecasting engine:

Implemented Analysis Types

general Statistics

It is a mechanism for collecting information about the data in the study sample. This type of analysis is intended for a preliminary study of the analyzed data source.

The analysis shows a number of characteristics of numerical and continuous fields. When a report is output to a spreadsheet document, pie charts are filled in to display the composition of the fields.

Association search

This type of analysis searches for frequently occurring groups of objects or characteristic values, and also searches for association rules. Association search can be used, for example, to identify frequently bought together goods or services:

This type of analysis can work with hierarchical data, which allows, for example, to find rules not only for specific products, but also for their groups. An important feature of this type of analysis is the ability to work both with an object data source, in which each column contains some characteristic of the object, and with an event source, where the object's characteristics are located in one column.

To facilitate the perception of the result, a mechanism for cutting off redundant rules is provided.

Sequence Search

The analysis type search for sequences allows you to identify sequential chains of events in the data source. For example, it could be a chain of goods or services that customers often purchase in succession:

This type of analysis allows you to search through the hierarchy, which makes it possible to track not only the sequences of specific events, but also the sequences of parent groups.

A set of analysis parameters allows the specialist to limit the time distances between the elements of the desired sequences, as well as to adjust the accuracy of the results.

cluster analysis

Cluster analysis allows you to divide the initial set of objects under study into groups of objects, so that each object is more similar to the objects from its group than to the objects of other groups. Analyzing in the future the resulting groups, called clusters, you can determine what characterizes a particular group, make a decision about the methods of working with objects of different groups. For example, using cluster analysis, you can divide the clients with whom the company works into groups in order to apply different strategies when working with them:

Using the cluster analysis parameters, the analyst can set up the algorithm by which the partition will be performed, and can also dynamically change the composition of the characteristics taken into account in the analysis, set up weighting factors for them.

The result of clustering can be displayed in a dendrogram - a special object designed to display sequential relationships between objects.

decision tree

The decision tree analysis type allows you to build a hierarchical structure of classifying rules, presented in the form of a tree.

To build a decision tree, you need to select the target attribute that will be used to build the classifier and a number of input attributes that will be used to create rules. The target attribute may contain, for example, information about whether the client switched to another service provider, whether the transaction was successful, whether the work was done well, etc. Input attributes, for example, can be the age of an employee, his work experience, the financial condition of the client, the number of employees in the company, etc.

The result of the analysis is represented as a tree, each node of which contains a certain condition. To decide which class a certain new object should be assigned to, it is necessary, answering the questions at the nodes, to go through the chain from the root to the leaf of the tree, passing to the child nodes in the case of an affirmative answer and to the neighboring node in the case of a negative one.

A set of analysis parameters allows you to adjust the accuracy of the resulting tree:

Forecast Models

The forecast models created by the engine are special objects that are created from the result of data analysis and allow you to automatically perform a forecast for new data in the future.

For example, an association search prediction model built in the analysis of customer purchases can be used when working with a purchasing customer in order to offer him products that he will purchase with a certain degree of probability along with the products he has chosen.

Using the data analysis mechanism in applied solutions

To familiarize developers of applied solutions with the mechanism of data analysis, a demonstration information base is placed on the disk "Information and technological support" (ITS). It includes the universal processing "Data Analysis Console", which allows you to perform data analysis in any application solution, without modifying the configuration.

 

It might be useful to read: