Google
Get custom programming done at GetAFreelancer.com!





AddThis Feed Button

Data Profiling and Quality using Data Mining Techniques

Budget: $250-$750
Fri, 4 Jul 2008 04:03:00 GMT
Category: .NET

Hi,

I am currently looking for a freelance for my school research project to be completed in a month. The project is about data profiling using data mining models/technique. Backend will need to use C# .Net to SQL server database. The profiling supposes to use various data mining models to filter out data from the database and generate a report of the analysis. Here is more details regarding to the project:


Project Title: "Data Profiling and Quality using Data Mining Techniques"

Abstract:
The corporate data universe consists of numerous databases linked by countless real-time and batch data feeds. In day-to-day operations, data transfers from one place to another and is updating in every second. From time-to-time, database gets to be redesigned and upgraded to later versions very frequently, and the same kind of upgrades is also happening for the application. In a result, we get a better performance and reliable information system to operate with, but the quality of data in the system has a possibility of deterioration. Researches show that the data quality for corporations has proven to have an intrinsic value to the business and consumers. A recent study shows that corporations are losing millions of dollars due to the issue of inaccurate data. High quality data combines with effective technologies is a great asset, but poor quality data with effective technology will become a great liability.

Data mining techniques are powerful in analyzing huge amounts of data. Recently, there are some interests start to show in the research community on the usage of data mining algorithms to profile and conduct Data Quality Assessment (DQA). The Result of this analysis can be stored in a metadata repository and data profiling reports can be generated. These generated results can be used for Data Cleansing System (DCS), prevention system by IT programmers or business analyst for controlling the data quality in the next phase of data quality control.

The purpose of this project is to develop a Data Profiling and Quality System (DPQS), which uses data mining based algorithms, such as Neural Networks, Decision Trees, etc. The result of the analysis will be stored in a metadata repository, and a report will be generated out from the repository to business and IT data analyst.

Software: SQL Server 2005, Visual Studio 2005 or 2008
Programming language: C# .Net for back-end, possible ASP for the front-end needed

The purpose of this project is not to correct data errors, but to generate a report of all errors found and also prediction analysis. SQL Server 2005 has data mining models built-in. Instead of using SQL Server, I will need this to run from a GUI. All of these are going to run in localhost of a single machine. This program needs to have a capability of using data mining models, including Decision Trees, Naive Bayes, Neural Network, Association Rules, Clustering, for all reports, such as data filtering and prediction, etc. Each model should provide different kind of outcome.

I’ve attached a sample of front-end layout. Users should be able to view the structure of the database and the content of each table as soon as it connects. Before this front-end, there should be a login page for users to gain access to this area. For this prototype, we may use the pre-install database "AdventureWorksDW" from SQL Server throughout the whole project.


Since this is a school research project, working code, but as simple as possible would be OK with me. Please let me know if any of you can handle this.

Thank you,

Wilson


Additional files submitted:


front-end_sample.jpg