Exploring Data with RapidMiner
Format: PDF / Kindle (mobi) / ePub
RapidMiner is a highly versatile tool that can make data work harder for you. This book will show you how to import, parse, and structure your data with remarkable speed and efficiency. It's data mining made accessible.
- See how to import, parse, and structure your data quickly and effectively
- Understand the visualization possibilities and be inspired to use these with your own data
- Structured in a modular way to adhere to standard industry processes
Data is everywhere and the amount is increasing so much that the gap between what people can understand and what is available is widening relentlessly. There is a huge value in data, but much of this value lies untapped. 80% of data mining is about understanding data, exploring it, cleaning it, and structuring it so that it can be mined. RapidMiner is an environment for machine learning, data mining, text mining, predictive analytics, and business analytics. It is used for research, education, training, rapid prototyping, application development, and industrial applications.
Exploring Data with RapidMiner is packed with practical examples to help practitioners get to grips with their own data. The chapters within this book are arranged within an overall framework and can additionally be consulted on an ad-hoc basis. It provides simple to intermediate examples showing modeling, visualization, and more using RapidMiner.
Exploring Data with RapidMiner is a helpful guide that presents the important steps in a logical order. This book starts with importing data and then lead you through cleaning, handling missing values, visualizing, and extracting additional information, as well as understanding the time constraints that real data places on getting a result. The book uses real examples to help you understand how to set up processes, quickly..
This book will give you a solid understanding of the possibilities that RapidMiner gives for exploring data and you will be inspired to use it for your own work.
What you will learn from this book
- Import real data from files in multiple formats and from databases
- Extract features from structured and unstructured data
- Restructure, reduce, and summarize data to help you understand it more easily and process it more quickly
- Visualize data in new ways to help you understand it
- Detect outliers and methods to handle them
- Detect missing data and implement ways to handle it
- Understand resource constraints and what to do about them
A step-by-step tutorial style using examples so that users of different levels will benefit from the facilities offered by RapidMiner.
Who this book is written for
If you are a computer scientist or an engineer who has real data from which you want to extract value, this book is ideal for you. You will need to have at least a basic awareness of data mining techniques and some exposure to RapidMiner.
Generate Attributes operator is used very frequently. It allows new attributes to be generated from other attributes, constant values, macros, and builtin functions. The way to think of this operator is to regard it as an automatic loop over all the examples in an example set. The newly generated attribute is added to all the examples. If the value of the new attribute is derived from the values of other attributes, the single value for the new attribute is taken from the values of the other
parameters and the next screenshot shows this. The values are taken from the original document. [ 58 ] Chapter 4 It is also important to uncheck the assume HTML checkbox for this to function correctly. The end result is the value 1352240369000, which is a UNIX time in milliseconds for the specific process. Refer to the generateExtract.xml process in the files that accompany this book for more information. Renaming attributes It is often the case that many operators generate new attributes
replace is about changing values of the attributes throughout the example set. There are a number of operators that can help including Map, Replace, and Replace (Dictionary). Which operator to use depends on how complex the replacement is and how many replacements have to be made. Using the Map operator The Map operator is the simplest and is best used to replace whole nominal values with alternatives. For example, if nominal attributes contain color and must be replaced completely with colour,
whole. It is also acceptable if the attribute does not have much influence on the final result. Clearly, such an attribute would be a candidate for removal anyway. If it additionally turns out to have missing values with an MCAR profile, this would be enough reason to remove it. If the missing values are MAR or NMAR, deleting the attribute is likely to affect model accuracy and more careful consideration would need to be given to deletion. Imputation with single values A simple approach to
matters is to practice on real data. A good step is to enter some data mining contests that appear on the Internet. Your place of work will almost certainly have data; if you can add value to it to solve a business problem you will get a lot of interest. The Internet itself has more sources of data with more being added every day. Finding a new insight into public sources of data, even through a simple visualization, may get you noticed. In short, there is a lot of data out there just waiting to