Data Mining

Open Access Article

1 - Referral Traffic Analysis: A Case Study of the Iranian Students' News Agency (ISNA)
Roya Hassanian Esfahani Mohammad Javad Kargar

10.7508/jist.2016.01.006

Web traffic analysis is a well-known e-marketing activity. Today most of the news agencies have entered the web providing a variety of online services to their customers. The number of online news consumers is also increasing dramatically all over the world. A news webs More

Web traffic analysis is a well-known e-marketing activity. Today most of the news agencies have entered the web providing a variety of online services to their customers. The number of online news consumers is also increasing dramatically all over the world. A news website usually benefits from different acquisition channels including organic search services, paid search services, referral links, direct hits, links from online social media, and e-mails. This article presents the results of an empirical study of analyzing referral traffic of a news website through data mining techniques. Main methods include correlation analysis, outlier detection, clustering, and model performance evaluation. The results decline any significant relationship between the amount of referral traffic coming from a referrer website and the website's popularity state. Furthermore, the referrer websites of the study fit into three clusters applying K-means Squared Euclidean Distance clustering algorithm. Performance evaluations assure the significance of the model. Also, among detected clusters, the most populated one has labeled as "Automatic News Aggregator Websites" by the experts. The findings of the study help to have a better understanding of the different referring behaviors, which form around 15% of the overall traffic of Iranian Students' News Agency (ISNA) website. They are also helpful to develop more efficient online marketing plans, business alliances, and corporate strategies. Manuscript profile

Open Access Article

2 - Privacy Preserving Big Data Mining: Association Rule Hiding
Golnar Assadat Afzali shahriyar mohammadi

10.7508/jist.2016.02.001

Data repositories contain sensitive information which must be protected from unauthorized access. Existing data mining techniques can be considered as a privacy threat to sensitive data. Association rule mining is one of the utmost data mining techniques which tries to More

Data repositories contain sensitive information which must be protected from unauthorized access. Existing data mining techniques can be considered as a privacy threat to sensitive data. Association rule mining is one of the utmost data mining techniques which tries to cover relationships between seemingly unrelated data in a data base.. Association rule hiding is a research area in privacy preserving data mining (PPDM) which addresses a solution for hiding sensitive rules within the data problem. Many researches have be done in this area, but most of them focus on reducing undesired side effect of deleting sensitive association rules in static databases. However, in the age of big data, we confront with dynamic data bases with new data entrance at any time. So, most of existing techniques would not be practical and must be updated in order to be appropriate for these huge volume data bases. In this paper, data anonymization technique is used for association rule hiding, while parallelization and scalability features are also embedded in the proposed model, in order to speed up big data mining process. In this way, instead of removing some instances of an existing important association rule, generalization is used to anonymize items in appropriate level. So, if necessary, we can update important association rules based on the new data entrances. We have conducted some experiments using three datasets in order to evaluate performance of the proposed model in comparison with Max-Min2 and HSCRIL. Experimental results show that the information loss of the proposed model is less than existing researches in this area and this model can be executed in a parallel manner for less execution time Manuscript profile

Open Access Article

3 - Preserving Data Clustering with Expectation Maximization Algorithm
Leila Jafar Tafreshi Farzin Yaghmaee

10.7508/jist.2016.03.004

Data mining and knowledge discovery are important technologies for business and research. Despite their benefits in various areas such as marketing, business and medical analysis, the use of data mining techniques can also result in new threats to privacy and informatio More

Data mining and knowledge discovery are important technologies for business and research. Despite their benefits in various areas such as marketing, business and medical analysis, the use of data mining techniques can also result in new threats to privacy and information security. Therefore, a new class of data mining methods called privacy preserving data mining (PPDM) has been developed. The aim of researches in this field is to develop techniques those could be applied to databases without violating the privacy of individuals. In this work we introduce a new approach to preserve sensitive information in databases with both numerical and categorical attributes using fuzzy logic. We map a database into a new one that conceals private information while preserving mining benefits. In our proposed method, we use fuzzy membership functions (MFs) such as Gaussian, P-shaped, Sigmoid, S-shaped and Z-shaped for private data. Then we cluster modified datasets by Expectation Maximization (EM) algorithm. Our experimental results show that using fuzzy logic for preserving data privacy guarantees valid data clustering results while protecting sensitive information. The accuracy of the clustering algorithm using fuzzy data is approximately equivalent to original data and is better than the state of the art methods in this field. Manuscript profile

Open Access Article

4 - A RFMV Model and Customer Segmentation Based on Variety of Products
Saman Qadaki Moghaddam Neda Abdolvand Saeedeh Rajaee Harandi

10.7508/jist.2017.19.002

Today, increased competition between organizations has led them to seek a better understanding of customer behavior through innovative ways of storing and analyzing their information. Moreover, the emergence of new computing technologies has brought about major change More

Today, increased competition between organizations has led them to seek a better understanding of customer behavior through innovative ways of storing and analyzing their information. Moreover, the emergence of new computing technologies has brought about major changes in the ability of organizations to collect, store and analyze macro-data. Therefore, over thousands of data can be stored for each customer. Hence, customer satisfaction is one of the most important organizational goals. Since all customers do not represent the same profitability to an organization, understanding and identifying the valuable customers has become the most important organizational challenge. Thus, understanding customers’ behavioral variables and categorizing customers based on these characteristics could provide better insight that will help business owners and industries to adopt appropriate marketing strategies such as up-selling and cross-selling. The use of these strategies is based on a fundamental variable, variety of products. Diversity in individual consumption may lead to increased demand for variety of products; therefore, variety of products can be used, along with other behavioral variables, to better understand and categorize customers’ behavior. Given the importance of the variety of products as one of the main parameters of assessing customer behavior, studying this factor in the field of business-to-business (B2B) communication represents a vital new approach. Hence, this study aims to cluster customers based on a developed RFM model, namely RFMV, by adding a variable of variety of products (V). Therefore, CRISP-DM and K-means algorithm was used for clustering. The results of the study indicated that the variable V, variety of products, is effective in calculating customers’ value. Moreover, the results indicated the better customers clustering and valuation by using the RFMV model. As a whole, the results of modeling indicate that the variety of products along with other behavioral variables provide more accurate clustering than RFM model. Manuscript profile

Open Access Article

5 - Identification of a Nonlinear System by Determining of Fuzzy Rules
hojatallah hamidi Atefeh Daraei

10.7508/jist.2016.04.002

In this article the hybrid optimization algorithm of differential evolution and particle swarm is introduced for designing the fuzzy rule base of a fuzzy controller. For a specific number of rules, a hybrid algorithm for optimizing all open parameters was used to reach More

In this article the hybrid optimization algorithm of differential evolution and particle swarm is introduced for designing the fuzzy rule base of a fuzzy controller. For a specific number of rules, a hybrid algorithm for optimizing all open parameters was used to reach maximum accuracy in training. The considered hybrid computational approach includes: opposition-based differential evolution algorithm and particle swarm optimization algorithm. To train a fuzzy system hich is employed for identification of a nonlinear system, the results show that the proposed hybrid algorithm approach demonstrates a better identification accuracy compared to other educational approaches in identification of the nonlinear system model. The example used in this article is the Mackey-Glass Chaotic System on which the proposed method is finally applied. Manuscript profile

Open Access Article

6 - Analysis of Business Customers’ Value Network Using Data Mining Techniques
Forough Farazzmanesh (Isvand) Monireh Hosseini

10.7508/jist.2017.19.003

In today's competitive environment, customers are the most important asset to any company. Therefore companies should understand what the retention and value drivers are for each customer. An approach that can help consider customers‘ different value dimensions is the More

In today's competitive environment, customers are the most important asset to any company. Therefore companies should understand what the retention and value drivers are for each customer. An approach that can help consider customers‘ different value dimensions is the value network. This paper aims to introduce a new approach using data mining techniques for mapping and analyzing customers‘ value network. Besides, this approach is applied in a real case study. This research contributes to develop and implement a methodology to identify and define network entities of a value network in the context of B2B relationships. To conduct this work, we use a combination of methods and techniques designed to analyze customer data-sets (e.g. RFM and customer migration) and to analyze value network. As a result, this paper develops a new strategic network view of customers and discusses how a company can add value to its customers. The proposed approach provides an opportunity for marketing managers to gain a deep understanding of their business customers, the characteristics and structure of their customers‘ value network. This paper is the first contribution of its kind to focus exclusively on large data-set analytics to analyze value network. This new approach indicates that future research of value network can further gain the data mining tools. In this case study, we identify the value entities of the network and its value flows in the telecommunication organization using the available data in order to show that it can improve the value in the network by continuous monitoring. Manuscript profile

Open Access Article

7 - DBCACF: A Multidimensional Method for Tourist Recommendation Based on Users’ Demographic, Context and Feedback
Maral Kolahkaj Ali Harounabadi Alireza Nikravan shalmani Rahim Chinipardaz

10.7508/jist.2018.04.004

By the advent of some applications in the web 2.0 such as social networks which allow the users to share media, many opportunities have been provided for the tourists to recognize and visit attractive and unfamiliar Areas-of-Interest (AOIs). However, finding the appropr More

By the advent of some applications in the web 2.0 such as social networks which allow the users to share media, many opportunities have been provided for the tourists to recognize and visit attractive and unfamiliar Areas-of-Interest (AOIs). However, finding the appropriate areas based on user’s preferences is very difficult due to some issues such as huge amount of tourist areas, the limitation of the visiting time, and etc. In addition, the available methods have yet failed to provide accurate tourist’s recommendations based on geo-tagged media because of some problems such as data sparsity, cold start problem, considering two users with different habits as the same (symmetric similarity), and ignoring user’s personal and context information. Therefore, in this paper, a method called “Demographic-Based Context-Aware Collaborative Filtering” (DBCACF) is proposed to investigate the mentioned problems and to develop the Collaborative Filtering (CF) method with providing personalized tourist’s recommendations without users’ explicit requests. DBCACF considers demographic and contextual information in combination with the users' historical visits to overcome the limitations of CF methods in dealing with multi- dimensional data. In addition, a new asymmetric similarity measure is proposed in order to overcome the limitations of symmetric similarity methods. The experimental results on Flickr dataset indicated that the use of demographic and contextual information and the addition of proposed asymmetric scheme to the similarity measure could significantly improve the obtained results compared to other methods which used only user-item ratings and symmetric measures. Manuscript profile

Open Access Article

8 - The Development of a Hybrid Error Feedback Model for Sales Forecasting
Mehdi Farrokhbakht Foumani Sajad Moazami Goudarzi

10.52547/jist.9.34.131

20.1001.1.23221437.2021.9.34.7.4

Sales forecasting is one of the significant issues in the industrial and service sector which can lead to facilitated management decisions and reduce the lost values in case of being dealt with properly. Also sales forecasting is one of the complicated problems in analy More

Sales forecasting is one of the significant issues in the industrial and service sector which can lead to facilitated management decisions and reduce the lost values in case of being dealt with properly. Also sales forecasting is one of the complicated problems in analyzing time series and data mining due to the number of intervening parameters. Various models were presented on this issue and each one found acceptable results. However, developing the methods in this study is still considered by researchers. In this regard, the present study provided a hybrid model with error feedback for sales forecasting. In this study, forecasting was conducted using a supervised learning method. Then, the remaining values (model error) were specified and the error values were forecasted using another learning method. Finally, two trained models were combined together and consecutively used for sales forecasting. In other words, first the forecasting was conducted and then the error rate was determined by the second model. The total forecasting and model error indicated the final forecasting. The computational results obtained from numerical experiments indicated the superiority of the proposed hybrid method performance over the common models in the available literature and reduced the indicators related to forecasting error. Manuscript profile

Current Issue

Published Issues

Menu

Browse

List of Articles Data Mining