
Data Analytics & Visualization
Document information
Author | Ks Gopinath Narayan |
Major | Analytics |
Company | IAAS |
Document type | Presentation |
Language | English |
Format | |
Size | 2.97 MB |
Summary
I.The Importance of Data Analytics and Data Discovery
This presentation emphasizes the crucial role of data analytics in extracting meaningful insights from large datasets. It highlights the shift from traditional Business Intelligence (BI) methods to modern self-service data discovery tools. The increasing availability of powerful, affordable computing resources, including 64-bit computing and lower RAM costs, has fueled the growth of in-memory analytics, enabling faster processing of massive data volumes. The concept of 'Big Data'—datasets too large for conventional processing—is introduced, citing Eric Schmidt's observation on exponential data growth. The need for advanced analytics capabilities, beyond traditional compliance auditing, is underscored.
1. Defining Data Analytics
The presentation begins by defining data analytics as the process of reducing data to understandable findings, extracting insights from operational, financial, and electronic data. It emphasizes that data analytics is an analytical process. This definition sets the stage for exploring the broader applications and importance of data analysis in various contexts.
2. The Rise of Big Data Analytics
The presentation highlights the significant influence of Big Data analytics on the field. Big data is defined as datasets so vast and complex that traditional data processing applications struggle to handle them. The massive growth in data volume is illustrated by a quote from Google CEO Eric Schmidt, who in 2010 noted that the amount of information created from the dawn of civilization to 2003 was equivalent to the amount now created every two days. This underscores the scale and impact of Big Data on the need for advanced analytic techniques.
3. Data Size and Technological Advancements
A primer on data sizes (KB, MB, GB, TB) provides context for understanding the scale of Big Data. Examples are given, such as the size of the complete works of Shakespeare (5MB) and the daily volume of tweets (12+ TB). The presentation then discusses how technological advancements, such as increased computational power, reduced RAM costs, and 64-bit computing, have made advanced analytics more affordable and accessible. This leads to the discussion of the shift in Business Intelligence (BI) from a top-down, IT-driven model to a self-service model empowered by new tools and in-memory analytics.
4. Classifying Data Analytics and the Need for Data Discovery
The presentation classifies data analytics into descriptive/data discovery, predictive, and prescriptive categories. It then focuses on the role of data discovery, particularly in the context of auditing. Traditional Computer Assisted Audit Techniques (CAATs) such as IDEA and ACL software are compared with newer approaches. The limitations of these CAATs, specifically their lack of macro-level analytical capabilities, are highlighted. The presentation emphasizes that while CAATs are good for evaluating known conditions, they often fail to provide the 'big picture' view necessary for identifying key areas of non-compliance. This need for a broader, more insightful approach is further underscored by a quote from Donald Rumsfeld, illustrating the existence of 'unknown unknowns' in data analysis.
II. Data Discovery Through Data Visualization
Effective data visualization is presented as key to data discovery. The presentation contrasts exploratory and explanatory visualization, explaining how visual analysis leverages the brain's pattern-recognition abilities to identify trends and outliers within complex datasets. The importance of choosing the right chart type—including bar charts, line charts, pie charts, scatter plots, bubble charts, histograms, heat maps, treemaps, and box-and-whisker plots—for different analytical purposes is highlighted. The Anscombe's Quartet example illustrates how datasets with similar statistical properties can appear dramatically different when visualized. This section also emphasizes the need for data visualization tools with strong data discovery capabilities.
1. The Importance of Visualization in Data Discovery
The section strongly advocates for data visualization as a critical component of the data discovery process. It emphasizes that visual analysis significantly aids analytical reasoning by leveraging the visual system's capabilities and the brain's inherent pattern-recognition abilities. This allows for the identification of trends, outliers, and interesting data points within a larger dataset. The presentation explicitly states that visualization helps move information from the dataset to the designer's mind (in exploratory visualization) or to a broader audience (in explanatory visualization). The overall goal is to make complex information more accessible and understandable.
2. Illustrative Example Anscombe s Quartet
To further illustrate the power of visualization, the presentation uses Anscombe's Quartet as an example. This famous statistical dataset comprises four sets of data points with nearly identical simple statistical properties (means, variances, etc.). However, when graphed, these datasets show drastically different visual patterns. This highlights the importance of visualizing data to uncover patterns and relationships that may not be readily apparent from numerical analysis alone. The different visual representations offer different interpretations of the underlying data, demonstrating the need for careful visual analysis.
3. Types of Visualizations and Their Applications
The presentation then details various types of charts and graphs used in data visualization, including bar charts, line charts, pie charts, map charts, scatter plots, bubble charts, histograms, heat maps, treemaps, and box-and-whisker plots. Each type is briefly described, along with its purpose and typical applications. For example, bar charts are recommended for comparing information across categories, while scatter plots are useful for investigating the relationship between variables. The presentation also notes that pie charts are frequently misused and that map charts are becoming increasingly important with the availability of location data. The selection of the appropriate chart type is crucial for effective communication and insightful data analysis.
III. Data Discovery Tools and Their Application in Auditing
The presentation discusses the limitations of traditional Computer Assisted Audit Techniques (CAATs) like IDEA and ACL software, noting their deficiencies in macro-level analytics. It advocates for the use of modern data discovery and visualization tools to improve audit efficiency and effectiveness. Specific tools, including Microsoft Excel (with Pivot and PowerPivot), QlikView, QlikSense, and Tableau, are mentioned as examples of solutions capable of handling large datasets and providing valuable insights for auditing. The need for tools offering strong data visualization and ease of use is stressed, especially for desktop use by auditors. A performance comparison of in-memory analytics using QlikView is briefly mentioned.
1. Limitations of Traditional Audit Tools
The presentation critiques traditional Computer Assisted Audit Techniques (CAATs), such as IDEA and ACL software, and the limited use of tools like MS Excel, MS Access, and SQL in auditing. It points out that while these tools are helpful for transaction-based analytics (rule-based or micro-level), involving data extraction, sorting, filtering, and joining, they often lack the macro-level analytical capabilities needed to understand the bigger picture. This limitation prevents auditors from effectively identifying key areas for non-compliance investigation, leaving them potentially focusing on known conditions or compliance audits without a broader strategic approach to risk assessment.
2. The Need for Advanced Data Discovery Tools
The need for modern data discovery tools is justified by their superior ability to provide a more comprehensive understanding of audit data. The presentation argues that effective tools should enable strong data discovery and data visualization, be relatively easy to learn and use, operate from a desktop/laptop for auditor convenience, and scale efficiently to handle large datasets. These features are deemed essential for moving beyond the limitations of traditional CAATs and achieving a more holistic audit perspective. The section underscores that these tools are necessary for addressing 'unknown unknowns'—unforeseen risks or issues that are not apparent through traditional methods.
3. Examples of Data Discovery Visualization Tools
Several specific data discovery and visualization tools are mentioned as suitable options, including QlikView, QlikSense, and Tableau. The presentation also acknowledges the continued relevance of Microsoft Excel, particularly its Pivot and PowerPivot functionalities. The inclusion of QlikView is noteworthy, with a direct link to its website provided. This emphasis on specific tools provides practical examples for auditors seeking to improve their analytical capabilities. The discussion includes the advantages of in-memory analytics for performance improvements, illustrated by a brief example of data stratification performance comparison using QlikView. The selection criteria for tools prioritize ease of use and the capability to handle substantial data volumes.