Unlock the Potential of Programming: Coding for Solutions

Tuesday, March 12, 2019

My Master Thesis

Classifying fraudulent Purchasing Behavior Utilizing Machine learning, ETL and database methods 

Summary 

This thesis investigates the unusual purchasing order behavior. It can say that this thesis is very helpful to solve the main problem which is being faced by those companies which are involved to make the Purchase Order of goods to complete their requirements.

This research will play a great role in the following three main areas to help any company.
  • Prevent the company to order the goods less than their requirements i.e. to avoid losing potential opportunities and to avoid delays in projects.
  • Enable the company to have a stable supply chain so that resources can be allocated accordingly.
  • Prevent the company to order the goods greater than their requirements i.e. to prevent white-collar crimes

It will investigate these above scenarios by using Machine Learning technique. It will select more than one Machine Learning algorithms (KNN, Kernel SVM, Random Forest and Artificial Neural Network) and then trained the algorithms from the past clean behavior of the Supply and Demand history of the goods. This is similar to fraud detection in which unusual spending patterns are suspected followed by the investigation to see if the spending is genuine.

The manufacturer company has the various branches all over the world and for the analytical purpose the headquarter needs updated information with its all branches all over the world that’s why Database systems and ETL(full loading, Incremental loading, and SCD type 2 loading ) are also in action in this paper.

METHODS AND RESULTS

Local database -“Thesis _Pak”

The raw data “PurchaseTable_FullLoad.csv,” was imported to that local database by using the SSIS tool. The sample of PurchaseTable_FullLoad.csv looks like the following.

PurchaseTable_FullLoad.csv

In order to extract this data to the local database, the full loading technique was adopted in the following manners. All the work is done in the SSIS tool.

IF EXISTS (SELECT 1 FROM information_schema.tables where table_name like 'tblPurchases_FullLoad')
    DROP TABLE tblPurchases_FullLoad

-- create a table to hold purchase ledger items
CREATE TABLE tblPurchases_FullLoad(
            [DescriptionOrder] [varchar](255) NULL,
            [Supplier] [varchar](255) NULL,
            [QuanlitiesOrder] [float] NULL,
            [CreatedData] [datetime] NULL,
                [Label] [varchar](255) NULL,
           
)
IF EXISTS (SELECT 1 FROM information_schema.tables where table_name like 'tblPurchases_FullLoad')
    DROP TABLE tblPurchases_FullLoad

-- create a table to hold purchase ledger items
CREATE TABLE tblPurchases_FullLoad(
            [DescriptionOrder] [varchar](255) NULL,
            [Supplier] [varchar](255) NULL,
            [QuanlitiesOrder] [float] NULL,
            [CreatedData] [datetime] NULL,
                [Label] [varchar](255) NULL,
           
)


The process to extract the Purchase data workbook and dumped into a local database


Now, the time to load the data to headquarter (USA) of ABC Company from its local branch by using the SSIS tool


SSIS package set up for Incremental Load: loading new record from the source table to target table

SSIS package set up for Incremental Load: updating a record in target table after the data have been altered in a source table


SSIS package set up for Incremental Load: updating record in target table after the data have been altered in source table


SSIS package set up for Incremental Load: updating and adding the new records simultaneously


The head quarter also adopted the SCD2 technique to load the data from its branch, let say that branch is located in Canada, because it wants to keep the historical data from that branch. In order to get this implementation, the two tables have been created. The staging table ([dbo].[PurchasesStaging_SCD2Load]), which is created in its branch while the “dbo.TargettblPurchases_SCD2” table have been built up in the headquarter. Hence, it can say that “dbo.TargettblPurchases_SCD2” acts like a target table while “[dbo].[PurchasesStaging_SCD2Load])” table have been used to extracting the data during the process of SCD2 implementation.
To keep the eyes on new records and changed records in dbo.TargettblPurchases_SCD2 MERGE statement comes and plays its role. The MERGE statement inserting the new records into the dbo.TargettblPurchases_SCD2 but the records firstly inserted into the [dbo].[PurchasesStaging_SCD2Load and here BINARY_CHECKSUM function is used to detect changes to a row of a table 

We don’t have any records at the staging table, which is located at the local branch (Canada).
Staging table at the local branch (Canada)

Let’s say that the local branch have the following records in csv file 

Records in csv file in the local branch

These records have to move to the staging table (source table located at Canada) by using the SSIS tool as shown in the following figure

Building up staging table

Then the staging table looks like the following 
Staging table at the local branch database


While we do not have any records on the target table as shown in the following figure

to be continued .......................


No comments:

Post a Comment