{"id":648,"date":"2024-12-16T14:23:59","date_gmt":"2024-12-16T14:23:59","guid":{"rendered":"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/__unknown__-3\/"},"modified":"2025-03-23T19:36:52","modified_gmt":"2025-03-23T19:36:52","slug":"6","status":"publish","type":"chapter","link":"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/","title":{"raw":"Chapter 6 Big Data, Business Intelligence and Analytics","rendered":"Chapter 6 Big Data, Business Intelligence and Analytics"},"content":{"raw":"<h2 class=\"import-Normal\"><strong>Opening Vignette<\/strong><\/h2>\r\n<p class=\"import-Normal\">Napoleon's invasion of Russia in 1812, also known as the Russian Campaign or the Patriotic War of 1812, was driven by a combination of geopolitical, economic, strategic, and personal motives. France\u2019s reasons behind the campaign included Napoleon\u2019s desire to enforce the Continental System which, by modern standards, is a form of sanctions aimed at weakening Britain by prohibiting trade between the British Empire and continental Europe.<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"alignleft\" width=\"1432\"]<img src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/image1-1.png\" alt=\"Minard's 1869 flow map of Napoleon's invasion of Russia in 1812\u20131813. A very innovative thematic map from the 19th century. Long description located below.\" width=\"1432\" height=\"682\" \/> Figure 6-1 1812<a href=\"https:\/\/en.wikipedia.org\/wiki\/Thematic_map\"> Napolean\u2019s March to Moscow by Charles Minard<\/a> \u2013 Public Domain[\/caption]\r\n<p class=\"import-Normal\">Tsar Alexander I of Russia had initially agreed to the Continental System in the Treaty of Tilsit (1807). However, Russia\u2019s economy suffered from the trade restrictions, leading Alexander to withdraw from the system in 1810. Napoleon sought to force Russia back into compliance with the Continental System to maintain the economic blockade against Britain, which he considered France's greatest rival. Napoleon decides to invade Russia.<\/p>\r\n<p class=\"import-Normal\">Napolean invaded Russia. He amassed \u201cThe Grand Arm\u00e9e\u201d of more than 400,000 men that started the march on June 24,1812 and moved on foot for more than 1000 miles from their starting point near the Niemen River (in present-day Poland\/Lithuania) towards Moscow. Napolean\u2019s Grand Army fought many battles along the way, mainly the battles of Smolensk (August 16\u201318, 1812) and the battle of Borodino (September 7, 1812) to finally arrive at a burned-out and deserted Moscow. With no one to fight with, to surrender, or to negotiate peace. Facing starvation he decided to retreat.<\/p>\r\n<p class=\"import-Normal\">During the retreat, the Napoleon\u2019s army traveled the same distance back, but under far worse conditions, often taking longer and more indirect routes.<\/p>\r\n\r\n<h3 class=\"import-Normal\"><strong>What does the chart depict<\/strong><\/h3>\r\n<p class=\"import-Normal\">The illustration depicts Napoleon's army departing the Polish-Russian border. A thick band illustrates the size of his army at specific geographic points during their advance and retreat. It displays <strong><em>six types of data in two dimensions<\/em><\/strong>: the number of Napoleon's troops; the distance traveled; temperature; latitude and longitude; direction of travel; and location relative to specific dates (Time Scale).<\/p>\r\n\r\n<h3 class=\"import-Normal\"><strong>Key Principle<\/strong><\/h3>\r\n<p class=\"import-Normal\">The chart in figure 7-1 is considered by many as the earliest depiction of what we call today as Big Data (400,000 troops to account for; A supply chain flow that extended across thousands of physical locations; and a linear progress constrained by weather and topography). Truly a picture is worth more than a thousand words.<\/p>\r\n<p class=\"import-Normal\">Most businesses operate on very thin margins and attempts at reducing cost is, in many instances, cutting into the proverbial bones. Competitiveness must be reimagined. New and modern methods that support the reimagination process goes far beyond doing \u201cthings\u201d better, faster and cheaper. Could the promise of a crystal ball of Predictive Analytics prevented such a massive and catastrophic defeat of Napoleon? Perhaps.<\/p>\r\n\r\n<div class=\"textbox textbox--learning-objectives\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Learning Objectives<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p class=\"import-Normal\">Upon completion of this chapter, the student should be able to:<\/p>\r\n\r\n<ul>\r\n \t<li>Clearly explain the difference between Data and Information<\/li>\r\n \t<li>Define Big Data and explain why it is called Big<\/li>\r\n \t<li>Define the 7 characteristics of Big Data<\/li>\r\n \t<li>Explain the technologies that support uses of Big Data<\/li>\r\n \t<li>Explain how Businesses use Big Data as an Intelligence tool<\/li>\r\n \t<li>Explain the 2 basic types of Analytics descriptive and predictive<\/li>\r\n \t<li>Demonstrate knowledge of basic methods that support Descriptive Analytics<\/li>\r\n \t<li>Demonstrate knowledge the basic methods that support predictive analytics<\/li>\r\n \t<li>Explain Data Mining<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\n[ez-toc]\r\n<h2 class=\"import-Normal\"><strong>Dat<\/strong><strong>a, Information and Intelligence <\/strong><strong>in Big Data <\/strong><\/h2>\r\n[caption id=\"\" align=\"alignleft\" width=\"430\"]<img class=\"\" src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/image2-1.jpeg\" alt=\"Data center with clouds.\" width=\"430\" height=\"430\" \/> Image generated by OpenAI\u2019s DALL\u00b7E[\/caption]\r\n<h3 class=\"import-Normal\"><strong>DATA<\/strong><strong> \u2013 Bits and Bytes<\/strong><\/h3>\r\n<p class=\"import-Normal\"><em>Data is defined as a mere fact, devoid of any context or relative use<\/em>. It is the smallest unit of recognition. A part of a number, a whole number, a part of a letter, a whole letter, a single word, and sometimes even an entire sentence that does not convey (inform) and on its own has very little usefulness. Within traditional computers, the smallest unit of Data is a representation of a bit. <em> A <\/em><em>b<\/em><em>it is a single 0 (Zero) or a 1 (one). <\/em>Zero\u2019s and Ones in computers mean that electricity IS EITHER flowing inside the computer or it is NOT. Imagine a light bulb for instance, the wall switch controls the state of the bulb with Zero indicating the bulb is OFF, or a One, indicating the bulb is ON. Data is Binary. In contrast with the decimal system containing the set of digits {0 to 9}, binary refers to a numbering system that uses only two digits: 0 and 1. A combinations thereof, constitute a single alphabet (character), a symbol (such as equal sign, greater than, brackets, period, etc.) or perhaps a part of an image on your screen. Imagine if a pixel on your screen is missing (OFF), your image would then look as if you had a hole in it. In natural languages such as English, its words are composed of characters. Computer words are also composed of characters. A character in computers, however, is always the\u00a0same number of bits and depends on the type, and version of an Operating System being used. In the Windows-based system (as of 2024) version 11, each character of the English language is represented by a unique set of 64 Zero\u2019s and 1\u2019s (bits).<\/p>\r\n<p class=\"import-Normal\">As we go about our daily lives, we often hear the term Byte. A <em>Byte<\/em> is a part of a character, symbol or an image, except, by convention it is a composition of 8 bits.<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"alignleft\" width=\"405\"]<img class=\"\" style=\"font-family: 'Sorts Mill Goudy', 'Times New Roman', serif;font-size: 16px\" src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/image3-2.png\" alt=\"Open book.\" width=\"405\" height=\"405\" \/> Image generated by OpenAI\u2019s DALL\u00b7E[\/caption]\r\n<h2 class=\"import-Normal\"><strong>Information<\/strong><\/h2>\r\n<p class=\"import-Normal\"><em>Information is a grouping of <\/em><em>bytes of <\/em><em>data that has context<\/em> <em>and is useful. <\/em>Just as a book convey meaning (Context). Context develops when Data is grouped together in a grammatically correct manner (structured) and is processed (analyzed for value) and provides relevance.<\/p>\r\n\r\n<h3 class=\"import-Normal\"><strong>Types of Data<\/strong><\/h3>\r\n<p class=\"import-Normal\">Data have been organized by conventions. For example, the American Standard for Information Interchange (ASCII, pronounced asskee), is a standard that was developed in the early 1960\u2019s to encode the English language character set into a language that computers can understand (i.e., zeros and Ones) and lists, in a table, the digital equivalent of the alphabet. Overtime, this coding standard expanded to include the various other types we know today. The two basic types of Data are:<\/p>\r\n<p class=\"import-Normal\"><em>Structured Types<\/em><em>:<\/em> Structured types of Data are sets that follow a standard size (each data element has the same number of Zero\u2019s and Ones) and includes:<\/p>\r\n&nbsp;\r\n<ul>\r\n \t<li><em>Character Type<\/em>: Assigns a unique string of O\u2019s and 1\u2019s, of same size, to each alphabet such as A (capital A), Vs. a (small case a)<\/li>\r\n \t<li><em>Numeric Type<\/em>: that includes the decimal (o\u2026.9)<\/li>\r\n \t<li><em>Symbols Type<\/em>: That includes all symbols found on a computer\u2019s keyboard<\/li>\r\n \t<li><em>Special Symbols Type<\/em>: That includes symbols for Math, Currency, Scientific and other uses<\/li>\r\n \t<li class=\"import-Normal\">Structured data fits neatly into a table and is easy to store and analyze by human and machines alike.<\/li>\r\n<\/ul>\r\n<p class=\"import-Normal\"><em>Unstructured Data Types (AIIM Blog, 2025):<\/em> Unstructured Data Types are data elements that are not easy to store, understand or analyze by human or machines. This Data type does not follow a convention (standard size for example) and includes:<\/p>\r\n\r\n<ul>\r\n \t<li><em>Text documents:\u00a0<\/em>Any text file like Word documents, emails, blog posts, survey responses, where the information isn't organized in a predefined structure<em>.\u00a0<\/em><\/li>\r\n \t<li><em>Images:\u00a0<\/em>Photographs, scanned documents, paintings - visual data without inherent organization<\/li>\r\n \t<li><em>Audio files:\u00a0<\/em>Music, voice recordings, podcasts - sound data that can't be easily categorized with structured fields<em>\u00a0<\/em><\/li>\r\n \t<li><em>Video files:\u00a0<\/em>Movies, surveillance footage, recorded presentations - moving images with no predefined data structure<em>\u00a0<\/em><\/li>\r\n \t<li><em>Social media posts:\u00a0<\/em>Tweets, Facebook updates, Instagram captions - text-based interactions with varying formats and content<em>\u00a0<\/em><\/li>\r\n \t<li><em>Handwritten notes:\u00a0<\/em>Notes written on paper, where the information is not digitally structured<\/li>\r\n<\/ul>\r\n<h3>The \u201cBig\u201d in Big Data<\/h3>\r\n<p class=\"import-Normal\">How much is too much? and where is it all coming from? may be questions we ponder. As we go about our daily lives, we generate process and consume tremendous amounts of data. According to current estimates,\u00a0humanity in the 21<sup>st<\/sup> century is producing\u00a0more data in a single day\u00a0than was ever produced in the entirety of human history up until the early 2000s, meaning the vast majority of data ever created is generated within the last couple of decades, with a significant portion created daily. To list but a few of the activities behind generating Data:<\/p>\r\n\r\n<ul>\r\n \t<li>Your teacher may ask you to write a 500-word paper on the topic of BIG Data. The average Word document with plain text (meaning no images, embedded HTML, or Videos, etc.) of single spaced, 500 English words document is around 65,000 bits. The average 500 words document with 3 images is around 1 million bits. An average textbook with 300 pages and 500 images is around 1.5 billion bits, or 187,500,000 Bytes.<\/li>\r\n \t<li>On average a person speaks approximately 30,000 words per day and a Princeton study calculated it to be around 40 million bits, or 5,000,000 Bytes.<\/li>\r\n \t<li>In the US, adults spend around 3 hours per day watching TV. All TV signals in the US have converted over to digital (i.e., bits) around 2010. Each hour of TV signals carries around 54 billion bits (in 4K HD), or 6,750,000,000 Bytes.<\/li>\r\n \t<li>On average Gen-Z interacts with the internet around 6 hrs. per day generating approximately 32 billion bits or around 4 Billion Bytes (Reviews.org, 2024).<\/li>\r\n \t<li>Just in the US, for calendar year 2022, general-purpose cards (Visa, Master Card, American Express, etc.) payments reached a record 153.3 billion transactions and $9.76 trillion in value. Note, these numbers didn\u2019t account for Private Label card (merchant-branded), Electronic Benefits Cards, Debit Cards, Gift Cards, or any other payment means such as Government Purchasing cards.<\/li>\r\n<\/ul>\r\n<h3 class=\"import-Normal\"><strong>\u00a0Characteristics of Big Data<\/strong><\/h3>\r\n<p class=\"import-Normal\">Big Data is a term that describes large sets of organized data that are so enormous in size and complexity that requires methods that go beyond the traditional Data Processing of hardware, software and analysis tools for understanding and management.<\/p>\r\n<p class=\"import-Normal\">There are 7 key characteristics that describe Big Data:<\/p>\r\n\r\n<h4 class=\"import-Normal\">1. <em>Volume:<\/em> are the amounts of data being generated, stored, and analyzed. 402 Million Terabytes are generated globally on daily basis. 90% of it is in just the last 2 years (2023 and 2024)<\/h4>\r\n<h4>2. <em>Velocity<\/em><em>: <\/em><em>Is t<\/em><em>he Speed at Which Data is Generated and Processed<\/em><em>. <\/em><\/h4>\r\nVelocity in the context of Big Data refers to the speed at which data is generated, collected, and processed. This characteristic is particularly important because the value of data often diminish rapidly if it is not processed and acted upon in real time or near-real time (Think of a stock price shown at the moment of opening \/ closing bell, or its constant movement throughout a trading day). Key aspects of Velocity include:\r\n<p class=\"import-NormalWeb\"><strong class=\"import-Strong\"><em>Real-time and Streaming Data<\/em><\/strong><strong><em>:<\/em><\/strong> Many Big Data applications rely on real-time data processing. Examples include social media platforms, financial trading, and embedded IoT sensor devices. In these cases, data is generated continuously and often needs to be processed instantly to extract value (e.g., fraud detection, or live decisions to buy\/sell recommendations).<\/p>\r\n<p class=\"import-NormalWeb\"><strong class=\"import-Strong\"><em>High-frequency Data<\/em><\/strong><strong><em>:<\/em><\/strong> Certain systems, like stock markets, sensor networks, or GPS-enabled devices, generate high-frequency data that needs to be analyzed almost as soon as it\u2019s created. These applications often require specialized technologies like stream processing to handle large volumes of rapidly flowing data.<\/p>\r\n<p class=\"import-NormalWeb\"><strong class=\"import-Strong\"><em>Timeliness and Latency<\/em><\/strong><strong><em>:<\/em><\/strong> Low latency (the delay between data generation and processing) is critical in situations such as autonomous vehicles, online gaming, or emergency response systems, where decisions need to be made instantly. The need for processing speed can impact how the infrastructure and algorithms are designed.<\/p>\r\n<p class=\"import-NormalWeb\"><strong class=\"import-Strong\">Data at Scale<\/strong><strong>:<\/strong> Not only does velocity deal with fast data generation, but also with the sheer scale at which it arrives. Consider the enormous amount of data produced by devices like smartphones, wearable health trackers, social media platforms, Home surveillance cameras, and Alexa and Siri-like devices, which all need to be processed continuously. Examples of Velocity in action includes:<\/p>\r\n<p class=\"import-NormalWeb\">Transactional Data on eCommerce websites: Recommendation systems generate real-time suggestions that are based on a user\u2019s browsing and purchasing behavior, requiring fast data processing.<\/p>\r\n<p class=\"import-NormalWeb\">Stock Markets: Financial institutions analyze stock tickers and market data in real time to make split-second decisions.<\/p>\r\n<p class=\"import-NormalWeb\">Social Media: Data is continuously generated by billions of users posting updates, comments, likes &amp; shares, and multimedia content. This data is processed quickly to serve included advertisements or to identify trending <em>topics.<\/em><\/p>\r\n\r\n<h4><em>3. <\/em><em>Variety<\/em><em>.<\/em><\/h4>\r\nVariety refers to the different types and formats of data that need to be processed and integrated in Big Data applications. In traditional databases, data is typically structured and stored in a tabular format (e.g., rows and columns), but Big Data often comes in diverse forms, such as text, images, videos, and sensor data. Key aspects of Variety include:\r\n<p class=\"import-Normal\">Structured Data: This is highly organized data that is easily searchable in databases (e.g., customer names, addresses, transaction records). While structured data remains important, it makes up only a small fraction of the data being generated today.<\/p>\r\n<p class=\"import-Normal\">Semi-structured Data: Data that doesn\u2019t fit neatly into a table but still contains some structure, often through hash tags or markers. Examples include XML and JSON files used for inter-website communication. Semi-structured data is increasingly common as companies integrate data from multiple sources.<\/p>\r\n<p class=\"import-Normal\">Unstructured Data: This is data without a predefined structure, making it more difficult to store and analyze. Unstructured data includes text files, emails, social media posts, video and audio files, images, and documents. A significant portion of Big Data is unstructured, and new tools are being developed to extract value from this type of data.<\/p>\r\n<p class=\"import-Normal\">Multimedia Data: This includes images, video, and audio, which require advanced technologies and processing like image recognition, facial recognition, speech-to-text, or video processing for analysis. For example, security systems might process video footage in real time to identify threats.<\/p>\r\n<p class=\"import-Normal\">Sensor Data (IoT Data): The Internet of Things (IoT) is generating vast amounts of data from connected devices (smartphones, smart home devices, wearables, etc.), which can be structured (e.g., sensor readings) or unstructured (e.g., sound or image data).<\/p>\r\n<p class=\"import-Normal\">Machine-Generated Data: This data is produced by machines and systems automatically, without human intervention. Examples include logs, transaction data, or sensor data from industrial equipment. This data often needs to be integrated with other types of data for meaningful analysis.<\/p>\r\n<p class=\"import-Normal\">Metadata: Data about the data, such as when and where it was created, who created it, and how it is related to other data. Metadata helps organize and interpret the content of various data formats. Examples of Variety in action include:<\/p>\r\n<p class=\"import-Normal\">Social Media Platforms: A platform like Meta (Facebook) or X (Twitter) has to process various forms of data, including text posts, images, videos, comments, and reactions.<\/p>\r\n<p class=\"import-Normal\">Healthcare Industry: Healthcare data includes structured patient records (e.g., diagnosis, treatment) and unstructured data (e.g., doctor's notes, radiology images, medical scans).<\/p>\r\n<p class=\"import-Normal\">Self-Driving Cars: These vehicles generate a combination of structured data (e.g., GPS data, speed, and temperature) and unstructured data (e.g., video from cameras).<\/p>\r\n\r\n<h4><em>4. <\/em><em>Veracity<\/em><\/h4>\r\n<em>Veracity is defined as <\/em><em>The Trustworthiness and Quality of Data<\/em><em>.<\/em> It refers to the uncertainty or reliability of the data. In the world of Big Data, not all data is clean, accurate, or reliable. Inaccurate, incomplete, or inconsistent data can lead to poor insights and flawed decision-making. An examples of Veracity would be Social Media posts, tweets, and streamed News Media, that may be inaccurate, biased, and in some instances misleading and \u201cmanufactured\u201d.\u00a0 Ensuring data veracity involves fact-checking and determining the credibility of the source. Other examples of the necessity for Veracity could appear in:\r\n<ul>\r\n \t<li class=\"import-Normal\">Medical or Scientific experiments<\/li>\r\n \t<li class=\"import-Normal\">Voting Systems<\/li>\r\n \t<li class=\"import-Normal\">Electronic Health Records<\/li>\r\n \t<li class=\"import-Normal\">Financial Transactions<\/li>\r\n \t<li class=\"import-Normal\">Student Records<\/li>\r\n<\/ul>\r\n<h4><em>5. <\/em><strong class=\"import-Strong\"><em>Value<\/em><\/strong><\/h4>\r\n<em>Value is defined as <\/em><em>The Utility and Insight of Data<\/em><em>. <\/em><strong class=\"import-Strong\">Value<\/strong> is about <strong class=\"import-Strong\">extracting useful insights<\/strong> from Big Data and is truly transformative across various industries. By leveraging data analytics, organizations in retail, healthcare, finance, manufacturing, agriculture, transportation, telecommunications, energy, and education can optimize operations, enhance customer experiences, reduce costs, and make more informed decisions. Value answers questions like:\r\n<ul>\r\n \t<li>Who is my best customer?<\/li>\r\n \t<li>What is the price my customers would be willing to pay for item X?<\/li>\r\n \t<li>What is today\u2019s best delivery route for my truck drivers?<\/li>\r\n \t<li>Which optimal treatment should I follow to treat type X cancer patients?<\/li>\r\n \t<li>Which curriculum should I develop to optimize student success?<\/li>\r\n \t<li>And a million other like questions\u2026\u2026<\/li>\r\n<\/ul>\r\n<p class=\"import-Normal\">As important as the other characteristics of Big Data are, Value is by far front and center of the topic of Big Data Analytics.<\/p>\r\n\r\n<h4 class=\"__UNKNOWN__\"><em>6. <\/em><strong><em>Variability<\/em><em>: The Changing Nature of Data<\/em><em>. <\/em><\/strong><\/h4>\r\n<p class=\"import-NormalWeb\"><strong class=\"import-Strong\">Variability<\/strong> refers to the <strong class=\"import-Strong\">inconsistent<\/strong> or <strong class=\"import-Strong\">dynamic nature<\/strong> of data. Unlike structured data, which is relatively stable and predictable, Big Data can be highly variable. The data patterns, formats, and sources may change over time, requiring systems to adjust accordingly. Data Variability affects every known sector of the economy. Few examples of Data Variability include:<\/p>\r\n<p class=\"import-NormalWeb\">Healthcare \u2013 A patient\u2019s record at a Physician\u2019s practice may have to connect to a Hospital\u2019s Electronic Records System made by different software company.<\/p>\r\n<p class=\"import-NormalWeb\">Utilities and Power Generation \u2013 Solar energy production is highly unpredictable because of conditions such as weather, wind and changes of the sun\u2019s location throughout the day. The same fluctuations also impact energy consumption.<\/p>\r\n<p class=\"import-NormalWeb\">Agriculture \u2013 Seasonal change, Rain Forecasts, Weather Conditions, Soil Conditions and a host of other variables impact decisions of what to grow, when to grow, when harvest, and where and to whom to sell farm products.<\/p>\r\n<p class=\"import-NormalWeb\">Manufacturing \u2013 Sourcing of raw materials and changing supply chains, Inventory of raw and finished goods, forecasting demand, labor allocation, and a host of other related variables.<\/p>\r\n<p class=\"import-NormalWeb\">Financial Services \u2013 In finance, data variability is often encountered due to fluctuating market conditions, changing customer behaviors, and inconsistencies in data sources. Examples of variability are prevalent in <strong class=\"import-Strong\">Stock Market Data, Customer Transaction history, and financial regulations and reporting.<\/strong><\/p>\r\n\r\n<h4>7. <strong class=\"import-Strong\"><em>Visualization<\/em><\/strong><\/h4>\r\n<em>Visualization <\/em><em>is defined<\/em> <em>as <\/em><em>the representation of an object, situation, or set of information as a chart or other image<\/em>.\r\n\r\n<strong class=\"import-Strong\">Visualization<\/strong> focuses on <strong class=\"import-Strong\">presenting Big Data<\/strong> in a way that makes it easier to understand and act upon (see Napoleon\u2019s Russia Campaign). Given the vast amount of data and complexity involved, effective visualization helps stakeholders make sense of patterns, trends, and insights that enable accurate prediction of outcomes.\r\n<h3><strong>Analytical<\/strong> <strong>Tools and Technologies<\/strong><\/h3>\r\n<p class=\"import-Normal\">As we have previously defined, Big Data are enormous sets of complex data sets that traditional data management methods of software, hardware and analysis processes are powerless in dealing with them. Facing that reality, technology development have answered this challenge and what emerged are truly amazing, simple and easy to use tools. In this section we will explore some of these transformative technologies.<\/p>\r\n\r\n<h3 class=\"import-Normal\">Data Warehouse, Data Marts and Data Lakes<\/h3>\r\n[caption id=\"\" align=\"alignleft\" width=\"344\"]<img class=\"\" src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/image6-1.png\" alt=\"Data warehouse.\" width=\"344\" height=\"344\" \/> Figure 6-2 Image of a Data Warehouse \u2013\u00a0 Generated with Google AI[\/caption]\r\n\r\nData in its unprocessed form is known as <em>raw data<\/em>. Making sound business decisions on raw data is highly inaccurate. Data has to be \u201cpre-processed\u201d, cleaned and filtered, and organized in such a manner to facilitate easy access, retrieval and follow-on processing. Traditionally, this is what was termed as a Transaction Processing System (TPS). TPS captures the data from its source through daily execution of normal activities. In a Merchant\u2019s system, starting at the cash register, the cash register collects transaction data (receipt of every item sold, retuned, or exchanged) by a customer. The merchant\u2019s back-end system will pre-process this data, ensuring its accuracy, and time stamp every receipt then forward these receipts to the perspective payment processor who in turn validates accuracy, availability of funds and performs a settlement (withdraw money from Buyer\u2019s bank and deposit the required amounts into merchant\u2019s bank account).\r\n\r\nAs also mentioned earlier, buy\/sell transactions are small data that can be very quickly processed. However, for super large merchants, transactions are not limited to just buy\/sell. The merchant may have a huge supply chain, 10\u2019s of thousands of physical locations, thousands of suppliers, and millions of customers interacting on daily basis, coupled with their urgent need to continue to be competitive and profitable.\r\n<p class=\"import-Normal\">A data warehouse is a centralized database (a software-based storage mechanism of data) that holds structured data in the form of records of transactions from many sources (Inmon, 1988). Customers, Suppliers, Banks, Government, payment processors, Gateway providers, Automated Clearing Houses, Products, Prices, Marketing Materials, and thousands of other pieces of information necessary to \u201cmanage\u201d the business. A key purpose of a data warehouse is to provide a facility for querying (asking) the data to reveal specific information. (e.g., who is my best customer?). Key characteristics of a Data Warehouse include:<\/p>\r\n<p class=\"import-Normal\"><strong class=\"import-Strong\"><em>Data is <\/em><\/strong><strong class=\"import-Strong\"><em>Subject-Oriented<\/em><\/strong><em>:<\/em> Data in a warehouse is organized around key business subjects such as sales data, finance data, customer data, supplier data, product data, etc.<\/p>\r\n<p class=\"import-Normal\"><strong class=\"import-Strong\"><em>Data is Integrated<\/em><\/strong><em>:<\/em> The Data Warehouse consolidates data from various disparate sources (e.g., operational databases, flat files, cloud sources data, marketing &amp; sales data, etc.), ensuring consistency in formats, units of measurement, and coding systems.<\/p>\r\n<p class=\"import-Normal\"><strong class=\"import-Strong\"><em>Data for <\/em><\/strong><strong class=\"import-Strong\"><em>Time-Varian<\/em><\/strong><strong class=\"import-Strong\"><em>ce<\/em><\/strong><em>:<\/em> The Data Warehouse stores historical data, allowing organizations to analyze trends and performance over different periods, often spanning months or years (e.g., how do 4<sup>th<\/sup> quarter sales of this year compare to last year\u2019s?)<\/p>\r\n<p class=\"import-Normal\"><strong class=\"import-Strong\"><em>Data is <\/em><\/strong><strong class=\"import-Strong\"><em>Non-Volatile<\/em><\/strong><em>:<\/em> Once data is entered into the warehouse, it is rarely changed or deleted, ensuring a stable and consistent historical record.<\/p>\r\n\r\n<h2><strong class=\"import-Strong\">Organization of a Data Warehouse<\/strong><\/h2>\r\n<p class=\"import-NormalWeb\">The organization of a data warehouse involves several layers that facilitate the flow of data from operational systems to analytical systems. These layers are crucial for structuring the data warehouse in a way that optimizes both storage and querying capabilities.<\/p>\r\n<strong class=\"import-Strong\"><em>Data Sources<\/em><\/strong><strong class=\"import-Strong\"><em>- <\/em><\/strong>Data warehouses typically draw data from a variety of <strong class=\"import-Strong\">internal and external sources<\/strong><strong>,<\/strong> including operational databases (e.g., sales or inventory systems), customer data, external market data, and sensor data. These sources are often heterogeneous, meaning they are in different formats and may be stored in different locations.\r\n\r\n<strong class=\"import-Strong\"><em>ETL Process (Extract, <\/em><\/strong><strong class=\"import-Strong\"><em>Transform, <\/em><\/strong><strong class=\"import-Strong\"><em>and <\/em><\/strong><strong class=\"import-Strong\"><em>Load)<\/em><\/strong><strong class=\"import-Strong\"><em>- <\/em><\/strong>The <strong class=\"import-Strong\">ETL process<\/strong> is the core of data integration in a data warehouse. It involves:\r\n<p class=\"import-Normal\"><strong class=\"import-Strong\">Extracting<\/strong> data from various sources.<\/p>\r\n<p class=\"import-Normal\"><strong class=\"import-Strong\">Transforming<\/strong> the data into a consistent format (e.g., converting data types, cleaning data, standardizing measurements).<\/p>\r\n<p class=\"import-Normal\"><strong class=\"import-Strong\">Loading<\/strong> the transformed data into the data warehouse, typically into <strong class=\"import-Strong\"><em>fact tables<\/em><\/strong> (which contain quantitative data) and <strong class=\"import-Strong\"><em>dimension tables<\/em><\/strong> (which contain descriptive attributes).<\/p>\r\n<strong class=\"import-Strong\"><em>Data Warehouse Schema: <\/em><\/strong>Once the data is loaded, it is typically organized into schemas (designs) that define how data is stored and accessed:\r\n<p class=\"import-Normal\"><strong class=\"import-Strong\"><em>Star Schema<\/em><\/strong><strong><em>:<\/em><\/strong> The most common schema in data warehouses. It consists of a central <strong class=\"import-Strong\"><em>fact table<\/em><\/strong> (storing numerical data, such as sales or revenue) and surrounding <strong class=\"import-Strong\"><em>dimension tables<\/em><\/strong> (storing descriptive data, such as customer or product information)(Oxford English Dictionary, 2025).<\/p>\r\n<p class=\"import-Normal\"><strong class=\"import-Strong\"><em>Snowflake Schema<\/em><\/strong><strong><em>:<\/em><\/strong> A more normalized version of the star schema, where dimension tables are further divided into additional tables to reduce data redundancy.<\/p>\r\n<strong class=\"import-Strong\"><em>OLAP (Online Analytical Processing)<\/em><\/strong><strong class=\"import-Strong\"> - <\/strong>Data Warehouses are designed for <strong class=\"import-Strong\">OLAP<\/strong>, which involves multidimensional analysis. <strong class=\"import-Strong\"><em>OLAP cubes<\/em><\/strong> allow users to view and analyze data from different perspectives (e.g., time, geography, product categories, etc). This enables fast querying and analysis of large datasets.\r\n<h2 style=\"text-align: left\"><strong class=\"import-Strong\">Data Warehouse<\/strong><strong class=\"import-Strong\"> and Decision Support<\/strong><\/h2>\r\n[caption id=\"attachment_703\" align=\"alignleft\" width=\"300\"]<img class=\"wp-image-703 size-medium\" src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/OLAP_Cube.svg_-300x260.png\" alt=\"OLAP cube.\" width=\"300\" height=\"260\" \/> <a href=\"https:\/\/en.wikipedia.org\/wiki\/OLAP_cube#\/media\/File:OLAP_Cube.svg\">An example of an OLAP cube, Konrad Roeder derivative work: Rehua (talk)<\/a> <a href=\"https:\/\/creativecommons.org\/licenses\/by-sa\/3.0\">CC BY-SA 3.0<\/a>[\/caption]\r\n<p class=\"import-NormalWeb\">The primary purpose of a data warehouse is to <strong class=\"import-Strong\">support decision-making processes<\/strong> by providing a consolidated, reliable source of historical data. Data warehouses help businesses by:<\/p>\r\n\r\n<h3><strong class=\"import-Strong\"><em>a) Data Consolidation and Integration<\/em><\/strong><\/h3>\r\n<p class=\"import-NormalWeb\">A data warehouse enables businesses to consolidate data from multiple systems, ensuring that all relevant data is available in one place for analysis. This integration helps organizations break down silos and gain a comprehensive view of their operations.<\/p>\r\n\r\n<h3><strong class=\"import-Strong\"><em>b) Historical Analysis<\/em><\/strong><\/h3>\r\n<p class=\"import-NormalWeb\">By storing large volumes of historical data, a data warehouse provides businesses with the ability to perform trend analysis, forecast future performance, and monitor changes over time. This capability is critical for long-term strategic planning and decision-making.<\/p>\r\n\r\n<h3><strong class=\"import-Strong\"><em>c) Performance and Efficiency<\/em><\/strong><\/h3>\r\n<p class=\"import-NormalWeb\">Since data warehouses are optimized for analytical queries, they allow for faster and more efficient reporting and data retrieval compared to operational databases. This reduces the burden on transactional systems, improving their performance.<\/p>\r\n\r\n<h3><strong class=\"import-Strong\"><em>d) Quality and Consistency<\/em><\/strong><\/h3>\r\n<p class=\"import-NormalWeb\">The ETL process ensures that the data stored in a data warehouse is clean, standardized, and consistent. This improves the quality of data analysis and ensures that decision-makers are working with reliable information.<\/p>\r\n\r\n<h2><strong class=\"import-Strong\">The Role <\/strong><strong class=\"import-Strong\">o<\/strong><strong class=\"import-Strong\">f Data Warehouse<\/strong><strong class=\"import-Strong\"> in Business Intelligence<\/strong><strong class=\"import-Strong\"> (BI)<\/strong><\/h2>\r\n<p class=\"import-NormalWeb\">Data Warehouses play a central role in <strong class=\"import-Strong\"><em>Business Intelligence (BI)<\/em><\/strong><em>,<\/em> which refers to the process of using data analysis tools and techniques to make informed business decisions. BI involves analyzing past performance to predict future outcomes, identify trends, and gain actionable insights. Data warehouses uses in BI include:<\/p>\r\n\r\n<ol>\r\n \t<li class=\"import-NormalWeb\"><strong class=\"import-Strong\"><em>Reporting and Dashboards<\/em><\/strong><strong class=\"import-Strong\"><em> - <\/em><\/strong>Data warehouses support the creation of reports and dashboards that summarize business performance. Business analysts and decision-makers can generate reports from the warehouse, providing insights into key performance indicators (KPIs), sales trends, financial performance, and customer behavior.<\/li>\r\n \t<li class=\"import-NormalWeb\"><strong class=\"import-Strong\"><em>Ad-Hoc Analysis<\/em><\/strong><strong class=\"import-Strong\"><em> -<\/em><\/strong> BI tools allow users to perform <strong class=\"import-Strong\">ad-hoc analysis<\/strong> on data from the warehouse. This means that business users can create their own queries or reports based on specific needs without relying on IT teams. The flexible querying capabilities of data warehouses make this possible by enabling users to quickly generate insights from large datasets.<\/li>\r\n \t<li class=\"import-NormalWeb\"><strong class=\"import-Strong\"><em>\u00a0Decision Support<\/em><\/strong><strong class=\"import-Strong\"><em> - <\/em><\/strong>Data Warehouses provide businesses with the ability to make data-driven decisions. Whether it\u2019s analyzing sales data to optimize inventory, understanding customer behavior to improve marketing strategies, or evaluating financial performance to guide investment decisions, the insights derived from a data warehouse are used to drive strategic decisions at every level of the organization.<\/li>\r\n \t<li class=\"import-NormalWeb\"><strong class=\"import-Strong\"><em>Business Analytics<\/em><\/strong><strong class=\"import-Strong\"><em> - <\/em><\/strong>With the integration of advanced analytics tools, organizations can use data warehouses for more sophisticated analyses such as forecasting, trend analysis, and scenario modeling. By combining historical data stored in the warehouse with current data and predictive models, businesses can develop insights that guide future strategies.<\/li>\r\n \t<li class=\"import-Normal\"><em><strong>Data Mining<\/strong> -<\/em> <em>is an integral process of BI. It is the process of analyzing large datasets to uncover hidden patterns, correlations, trends, and relationships within the data<\/em>. It combines aspects of statistics, machine learning, and artificial intelligence (AI) to extract meaningful information and predict future outcomes (Eye on Tech, 2021).<\/li>\r\n<\/ol>\r\nhttps:\/\/youtu.be\/mOfPG5ZIY-k\r\n<h3 class=\"import-Normal\"><strong>Key Characteristics of Data Mining<\/strong><\/h3>\r\n<p class=\"import-Normal\">As a process, Data Mining is characterized by<\/p>\r\n\r\n<ul>\r\n \t<li class=\"import-Normal\"><strong><em>Pattern Discovery<\/em><\/strong><em>:<\/em> Data mining helps identify patterns, which could be used to predict future behavior or trends.<\/li>\r\n \t<li class=\"import-Normal\"><strong><em>Classification and Clustering<\/em><\/strong><em>:<\/em> It groups data into categories (classification) or finds natural groupings (clustering).<\/li>\r\n \t<li class=\"import-Normal\"><strong><em>Prediction<\/em><\/strong><em>:<\/em> Using historical data to predict future events or behaviors.<\/li>\r\n \t<li class=\"import-Normal\"><em><strong>Associatio<\/strong>n<\/em><em>:<\/em> It identifies associations and relationships between variables (e.g., market basket analysis).<\/li>\r\n \t<li class=\"import-Normal\"><strong><em>Anomaly Detection<\/em><\/strong>: It detects outliers or unusual patterns that might indicate fraud or other exceptional circumstances.<\/li>\r\n<\/ul>\r\n<h3 class=\"import-Normal\"><strong>Categories of Business Intelligence and Analytics<\/strong><\/h3>\r\n<p class=\"import-Normal\">There are several techniques used by Financial Technology companies to advance their penetration into the Fintech market. They include Descriptive Analytics, Predictive Analytics, Optimization, Simulation and Text, Image and Video Analysis.<\/p>\r\n\r\n<div style=\"margin: auto\">\r\n<table style=\"width: 527pt\"><caption><strong>General Categories of BI\/Analytics Techniques<\/strong><\/caption>\r\n<thead>\r\n<tr style=\"height: 15pt\">\r\n<th style=\"background-color: #9bbb59;border-width: 0pt 0.5pt 0.5pt 1pt;border-style: none solid solid;border-color: windowtext;padding: 0pt 5.4pt;vertical-align: bottom;width: 111.8px\">\r\n<p class=\"import-Normal\" style=\"text-align: center\"><strong>Descriptive<\/strong><\/p>\r\n<\/th>\r\n<td class=\"TableNormal-C\" style=\"background-color: #f2dcdb;vertical-align: bottom;border-width: 0pt 0.5pt 0.5pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 110.7px\">\r\n<p class=\"import-Normal\" style=\"text-align: center\"><strong>Predictive <\/strong><\/p>\r\n<\/td>\r\n<td class=\"TableNormal-C\" style=\"background-color: #ffff00;vertical-align: bottom;border-width: 0pt 0.5pt 0.5pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 119.1px\">\r\n<p class=\"import-Normal\" style=\"text-align: center\"><strong>O<\/strong><strong>p<\/strong><strong>t<\/strong><strong>i<\/strong><strong>mization<\/strong><\/p>\r\n<\/td>\r\n<td class=\"TableNormal-C\" style=\"background-color: #e26b0a;vertical-align: bottom;border-width: 0pt 0.5pt 0.5pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 131.562px\">\r\n<p class=\"import-Normal\" style=\"text-align: center\"><strong>Simulation<\/strong><\/p>\r\n<\/td>\r\n<td class=\"TableNormal-C\" style=\"background-color: #66ffff;vertical-align: bottom;border-width: 0pt 1pt 0.5pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 152.7px\">\r\n<p class=\"import-Normal\" style=\"text-align: center\"><strong>Text &amp; Video Analysis<\/strong><\/p>\r\n<\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr class=\"TableNormal-R\" style=\"height: 30pt\">\r\n<th style=\"background-color: #e5fed8;border-width: 0pt 0.5pt 0.5pt 1pt;border-style: none solid solid;border-color: windowtext;padding: 0pt 5.4pt;vertical-align: bottom;width: 111.8px\">\r\n<p class=\"import-Normal\"><strong>Visual Analysis<\/strong><\/p>\r\n<\/th>\r\n<td class=\"TableNormal-C\" style=\"background-color: #e5fed8;vertical-align: bottom;border-width: 0pt 0.5pt 0.5pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 110.7px\">\r\n<p class=\"import-Normal\"><strong>Time Series Analysis<\/strong><\/p>\r\n<\/td>\r\n<td class=\"TableNormal-C\" style=\"background-color: #e5fed8;vertical-align: bottom;border-width: 0pt 0.5pt 0.5pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 119.1px\">\r\n<p class=\"import-Normal\"><strong>Genetic A<\/strong><strong>l<\/strong><strong>gorith<\/strong><strong>m<\/strong><strong>s<\/strong><\/p>\r\n<\/td>\r\n<td class=\"TableNormal-C\" style=\"background-color: #e5fed8;vertical-align: bottom;border-width: 0pt 0.5pt 0.5pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 131.562px\">\r\n<p class=\"import-Normal\"><strong>Scenario Analysis <\/strong><\/p>\r\n<\/td>\r\n<td class=\"TableNormal-C\" style=\"background-color: #e5fed8;vertical-align: bottom;border-width: 0pt 1pt 0.5pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 152.7px\">\r\n<p class=\"import-Normal\"><strong>Text Analysis<\/strong><\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr class=\"TableNormal-R\" style=\"height: 30.75pt\">\r\n<th style=\"background-color: #e5fed8;border-width: 0pt 0.5pt 1pt 1pt;border-style: none solid solid;border-color: windowtext;padding: 0pt 5.4pt;vertical-align: bottom;width: 111.8px\">\r\n<p class=\"import-Normal\"><strong>Regression Analysis<\/strong><\/p>\r\n<\/th>\r\n<td class=\"TableNormal-C\" style=\"background-color: #e5fed8;vertical-align: bottom;border-width: 0pt 0.5pt 1pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 110.7px\">\r\n<p class=\"import-Normal\"><strong>Data Mining <\/strong><\/p>\r\n<\/td>\r\n<td class=\"TableNormal-C\" style=\"background-color: #e5fed8;vertical-align: bottom;border-width: 0pt 0.5pt 1pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 119.1px\">\r\n<p class=\"import-Normal\"><strong>Linear Programming<\/strong><\/p>\r\n<\/td>\r\n<td class=\"TableNormal-C\" style=\"background-color: #e5fed8;vertical-align: bottom;border-width: 0pt 0.5pt 1pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 131.562px\">\r\n<p class=\"import-Normal\"><strong>Monte Carlo Simulations<\/strong><\/p>\r\n<\/td>\r\n<td class=\"TableNormal-C\" style=\"background-color: #e5fed8;vertical-align: bottom;border-width: 0pt 1pt 1pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 152.7px\">\r\n<p class=\"import-Normal\"><strong>Video Analysis<\/strong><\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 116.2px\"><\/td>\r\n<td style=\"width: 115.1px\"><\/td>\r\n<td style=\"width: 123.5px\"><\/td>\r\n<td style=\"width: 135.962px\"><\/td>\r\n<td style=\"width: 157.1px\"><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<h3 class=\"import-Normal\"><strong>Descriptive Analytics <\/strong><\/h3>\r\n<p class=\"import-Normal\"><em>Descriptive Analytics \u2013<\/em> is an early, pre-processing of the Data. Its intent is to identify <em>TRENDS, and PATTERNS<\/em> in the data answering questions like who, what, where, when and why certain conditions or events occurred and as revealed by the analysis of the data. For example, a high volume of positive tweets about a certain company may lead to an increase in its stock price.<\/p>\r\n<p class=\"import-Normal\">Descriptive Analytics uses simple tools such as Excel spreadsheets to organize, categorize, clean and prepare the data then run excel functions such as Pivot Tables, and What-If-Anaylsis. Data is typically presented in forms of graphs, charts and other graphics using Visualization tools such as Microsft\u2019s PowerBI or Plateau amongst many others.<\/p>\r\n<p class=\"import-Normal\"><em>Visual Analysis \u2013<\/em> Presents results of the analysis in pictorial form (see Napoleon\u2019s March). Other common use of Visualizing the data c0ontent is through what is know as a Word Count. Figure X-2 below shows a typical representation called Word Cloud.<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"alignleft\" width=\"336\"]<img src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/image9.jpeg\" alt=\"Data visualization. \" width=\"336\" height=\"336\" \/> Figure 6-3 showing visualization of the chapter on Big Data Analytics \u2013 Image generated by OpenAI\u2019s DALL\u00b7E[\/caption]\r\n<p class=\"import-Normal\">To illustrate the concept, this very chapter on Big Data and Analytics topic was used by Word Cloud analytics. Word Cloud analysis uses frequency of occurrence of an important word by size indicating preference. As we see in the above image, the word DATA is the primary theme due to its size. Other words of lessor importance appear in a much smaller font size. <span style=\"font-size: 14px\">Fig- 7-11 Conversion Funnel for a typical eCommerce web site showing effectivity of the website in gaining new sales. Image generated by OpenAI\u2019s DALL\u00b7E<\/span><\/p>\r\n<p class=\"import-Normal\"><em>Conversion Funnel \u2013<\/em> is an analysis tool used to show comparative statistics.<\/p>\r\nA visual representation below of a conversion funnel for a typical eCommerce website shows the stages and effectiveness in gaining new sales. In our example an eCommerce web site\u2019s effectivity is in attracting potential customers, keeping their interest high and ultimately converting them into actual customers. The funnel, divided into five clearly labeled sections from top to bottom\u2014'Visitors', 'Product Views', 'Add to Cart', 'Checkout Initiated', and 'Sales Completed'\u2014demonstrates how an eCommerce site attracts potential customers, maintains their interest, and ultimately converts them into actual buyers. Each section becomes progressively narrower, symbolizing the drop-off rates at each stage, with corresponding percentages displayed to highlight this attrition. The clean, minimal background highlights the color-coded stages, each with icons like a shopping bag for 'Add to Cart' and a dollar sign for 'Sales Completed' to enhance clarity.\r\n\r\n[caption id=\"attachment_1091\" align=\"alignleft\" width=\"300\"]<img class=\"wp-image-1091 size-medium\" src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/An_infographic-style_image_of_a_conversion_funnel_-300x300.jpg\" alt=\"Conversion funnel for a typical eCommerce website, showing the stages and effectiveness in gaining new sales. Long description above.\" width=\"300\" height=\"300\" \/> Figure 6-4 A visual representation of a conversion funnel for a typical eCommerce website. Image generated by OpenAI\u2019s DALL\u00b7E[\/caption]\r\n<p class=\"import-Normal\"><em>Regression <\/em><em> Analysis<\/em><em> - <\/em><em> Regression analysis is a simple statistical computation of dependent Vs. Independent variables. <\/em> Let\u2019s suppose that the US government publishes a statistic on new home building starts that says \u201c\u2026Building new home starts in 2025 will exceed 2 Million new homes\u201d and lets also presume you are a manufacturer of Door Handles. What does 2 million new home starts indicate to your business? This is where how many door handles should I make decision comes in. The Quantity of how many door handles should you make is <em>dependent<\/em> on how many new homes will be built (along with other variable numbers such as competition). What are your distribution channels? Will you be using Home Depot, Lowes, commercial building supply wholesalers, etc., as you can imagine, these variables could get pretty complex.<\/p>\r\n<p class=\"import-Normal\">Using Excel it is pretty easy task to just plug all the variable numbers (<em>dependents<\/em>) against the <em>i<\/em><em>ndependent<\/em> variable (# of new housing starts) to generate a regression chart that would describe your analysis.<\/p>\r\n\r\n<h3 class=\"import-Normal\"><strong>Predictive Analysis <\/strong><\/h3>\r\n<p class=\"import-Normal\"><em>Predictive Analysis \u2013 is a set of statistical tools that analyze data to predict a certain outcome<\/em> and is primarily composed of 2 branches, mainly Data Mining and Time Series analysis. We covered data mining in detail in the previous discussion on Big Data. Inhere we will discuss the topic of Time Series Analysis.<\/p>\r\n<p class=\"import-Normal\"><em>Time Series Analysis \u2013 is a <\/em><em>statistical tools<\/em><em> used to examines data points collected over a period of time<\/em><em>, allowing analysts to identify patterns, trends, and seasonal variations within the data by observing how values change over consistent<\/em><em> time<\/em><em> intervals, often used to forecast future values based on past data patterns<\/em>.<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"alignleft\" width=\"608\"]<img src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/image10.png\" alt=\"Frequency of extreme weather for different degrees of global warming - bar chart. Long description next to image.\" width=\"608\" height=\"342\" \/> Figure 6-5 Frequency of extreme weather for different degrees of global warming - bar chart <a href=\"https:\/\/en.m.wikipedia.org\/wiki\/File:20211109_Frequency_of_extreme_weather_for_different_degrees_of_global_warming_-_bar_chart_IPCC_AR6_WG1_SPM.svg\">Wikimedia Commons<\/a>.[\/caption]\r\n<p class=\"import-Normal\">An example of time series data is daily, time stamped measures of temperature, dew points, humidity, wind speed, rain fall, etc. over a period spanning the last 10 years. What patterns emerge when we crunch these numbers? What is the correlation between temperature and rainfall? etc.<\/p>\r\n<p class=\"import-Normal\">Time series analysis can be used to predict patient arrival times at a hospital emergency room, ensuring adequate nursing staff are available, or answer the question of how many Fintech suppliers are available to run\u201d buy transactions\u201d at 3:00 AM and predict acceptance Vs. rejection of a transaction. Time series in this case performs forecasting (i.e., predicting the future outcome).<\/p>\r\n\r\n<h3 class=\"import-Normal\"><strong>Optimization Analysis<\/strong><\/h3>\r\n<p class=\"import-Normal\">Minimizing costs and maximizing profitability are cornerstone to every successful business enterprise. Management by data, rather than by intuition, is the subject of Optimization Analysis. What area of my business needs attention?, why are my labor costs much higher than industry benchmarks?, how do I improve my supply chain? and so many more questions can be answered by applying Optimization Analysis methods. One popular analysis method is called the Genetic Algorithm.<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"alignleft\" width=\"535\"]<img src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/image11-1.jpeg\" alt=\"Introgressive hybridization in plants\" width=\"535\" height=\"586\" \/> Figure 6-6 <a href=\"https:\/\/en.wikipedia.org\/wiki\/Introgressive_hybridization_in_plants#\/media\/File:Backcrossing_leads_to_introgression.jpg\">\u00a0Image of Genetic Process<\/a> <a class=\"new\" title=\"User:Mcruzan (page does not exist)\" href=\"https:\/\/commons.wikimedia.org\/w\/index.php?title=User:Mcruzan&amp;action=edit&amp;redlink=1\">Mcruzan<\/a>\u00a0<a class=\"mw-mmv-license\" href=\"https:\/\/creativecommons.org\/licenses\/by-sa\/4.0\" target=\"_blank\" rel=\"noopener\">CC BY-SA 4.0<\/a>[\/caption]\r\n<p class=\"import-Normal\"><em>Genetic Algorithms \u2013 A reference to the English Naturalist Charles Darwin\u2019s Theory of Evolution where he states<\/em><em>:<\/em><em> \u201cAll organisms rise and develop through natural evolution processes of small, inherited variations (Shannon, n.d.).<\/em><\/p>\r\n<p class=\"import-Normal\">A genetic algorithm is a software-driven step-by-step process that replicates the inheritance properties in the data in order to find approximate solutions to optimization. Think of a married couple, one with brown eyes, the other has green eyes. What is the chance of having a baby with hazel colored eyes? At a mathematical average it should be 50% of the time.<\/p>\r\n<p class=\"import-Normal\">A genetic algorithms works by taking a node (starting) population of data called chromosomes of individual solutions and through multiple simulations this population is gradually evolved, by an iterative process towards a better and better solution. Think again of playing chess and again with every new game, eliminating (not playing) prior moves when you said to yourself I should have not played that move, and selecting only those prior moves that resulted in a better outcome. For each generation (successive plays), your chances of winning the game improve.<\/p>\r\n\r\n<h3 class=\"import-Normal\"><strong>Linear Programming <\/strong><\/h3>\r\n<p class=\"import-Normal\">Linear programming is a simple algebraic method that uses linear equations to determine how to arrive at the optimal situation (maximum or minimum) as an answer to a mathematical problem(Shannon, 1948). (e.g., 2X+3=8).<\/p>\r\n<p class=\"import-Normal\">Again, Excel simplifies solving for linear functions with one or more variables.<\/p>\r\n\r\n<h3 class=\"import-Normal\"><strong>Simulations <\/strong><\/h3>\r\n<p class=\"import-Normal\">In Florida, weather patterns are monitor by the US National Oceanic and Atmospheric Administration (NOAA), <span class=\"import-jpfdse\">National Hurricane Center<\/span> (NHC) that analyze satellite imagery, weather data and historical observations to generates computer models that make forecast decisions and create hazard information for emergency managers, media and the public for hurricanes, tropical storms and tropical depressions.<\/p>\r\n<p class=\"import-Normal\">A computer simulation mimics the real world (virtual) and as such in Fintech, dynamic responses of a system. For example, take a transaction process starting at a point-of-sale, and all the intermediary steps of transaction aggregation, clearing, processing, and settlement ca be simulated to understand the behavior of the system. Simulation would answer questions like what would happen if we stress the system by having billions of transactions flow simultaneously. What would break? How and where would it break? and what are the net results of the break?<\/p>\r\n<p class=\"import-Normal\">Financial Systems and Fintech systems simulations work using sophisticated statistical methods such as Monte Carlo analysis that \u201cmodel\u201d the different financial market conditions and potential outcomes by assigning random values to uncertain variables and running multiple iterations to calculate probabilities of various results;\u00a0this is widely used in financial risk management, asset valuation, and portfolio allocation. An example use would be a financial planner helping a retiree with answering \u201chow long will my retirement portfolio last\u201d?<\/p>\r\n<p class=\"import-Normal\">Again, as complex as it may sound, simple solutions (simulations) can be developed using Excel. Running multiple Monte Carlo simulation functions, one can determine answers like \u201cin 100 runs of the simulations, only 20 runs indicated that this portfolio will last for more than 10 years\u201d.<\/p>\r\n\r\n<h3 class=\"import-Normal\"><strong>Text and Video Analytics <\/strong><\/h3>\r\n<p class=\"import-Normal\">Text and Video Analytics are techniques used in evaluating textual, as well as video imagery to find hidden patterns in the data.<\/p>\r\n<p class=\"import-Normal\"><em>Textual Analytics<\/em><em>- I<\/em><em>s a process that extracts value from ver<\/em><em>y large amount of textual data such as consumer reports, comments, complaints and product reviews. It also monitors social media postings to identify consumer sentiment and to recognize changes in consumer behavior. <\/em><\/p>\r\n<p class=\"import-Normal\">Many companies in the US use free-form call center records to synthesize their support operations, marketing and pricing strategies, and new product development.<\/p>\r\n<p class=\"import-Normal\"><em>Video Analytics \u2013 is also a process that extracts deeper insights <\/em><em>from video footage<\/em>. Imagine the value, in addition to cost savings, while running a new product development marketing campaign, of having customers sample your latest perfume while you observe their facial expression after sampling the products you are market testing. Many transportation companies as for Trains, Busses and Planes, currently use video analytics to understand commuter behavior and to implement methods to ease congestion at terminals.<\/p>\r\n\r\n<h3 class=\"import-Normal\"><strong>Data Marts <\/strong><\/h3>\r\n<p class=\"import-Normal\"><em>Data marts<\/em> <em>are specialized, smaller-scale repositories<\/em> that store data custom-tailored to meet the needs of specific business units or functional areas within an organization, such as sales, marketing, or finance. Unlike comprehensive data warehouses, which aggregate data across an organization, data marts focus on particular domains, offering streamlined data analysis for targeted decision-making.<\/p>\r\n&nbsp;\r\n\r\n[caption id=\"\" align=\"alignleft\" width=\"602\"]<img class=\"\" src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/image12.png\" alt=\"Data Collection and Brokering\" width=\"602\" height=\"259\" \/> <a href=\"https:\/\/en.wikipedia.org\/wiki\/Data_imaginaries#\/media\/File:Data-driven_vs_data-brokering.png\">Data Collection and Brokering<\/a>,<span class=\"mw-mmv-author\"><a class=\"new\" title=\"User:Jasraj Raghuwanshi (page does not exist)\" href=\"https:\/\/commons.wikimedia.org\/w\/index.php?title=User:Jasraj_Raghuwanshi&amp;action=edit&amp;redlink=1\">Jasraj Raghuwanshi<\/a><\/span>\u00a0 <a class=\"mw-mmv-license\" href=\"https:\/\/creativecommons.org\/licenses\/by-sa\/4.0\" target=\"_blank\" rel=\"noopener\">CC BY-SA 4.0<\/a>[\/caption]\r\n<p class=\"import-Normal\">Data Marts extracts and organizes data from the central data warehouse (or external sources), optimizing it for easy access and faster queries. By categorizing Data, companies are often using this organized data to sell rather than use internally. A search on Google\u2019s website for example reveals 100\u2019s of companies known as Data Brokers\u201d with names like Acxiom, Experian, CoreLogic, Google, Salesforce, and Signal AI that are considered to be among the companies that sell user data (your data\u00a0and mine), often by collecting personal information from various sources and then selling it to other businesses in bulk for targeted marketing and other purposes (The Verge, 2025).<\/p>\r\n\r\n<h3 class=\"import-Normal\"><strong>Data Lakes<\/strong><\/h3>\r\n[caption id=\"\" align=\"alignleft\" width=\"518\"]<img class=\"\" src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/image13.png\" alt=\"Data Lake\" width=\"518\" height=\"518\" \/> A representation of a Data Lake \u2013 Courtesy ChatGPT[\/caption]\r\n<p class=\"import-Normal\">Structured data is quite easy for computers to process. The challenge with Big Data are unstructured data such as a streamed movie, and an hour of streamed music are just few on unstructured data sources. Unlike traditional data warehouses that store structured data in predefined schemas, <em>data lakes allow for the storage of raw data <\/em><em>regardless of<\/em><em> format<\/em><em>, size, or source. <\/em>Hence the term Lake that has many streams pouring into it and many different species of aquatic life. This capability enables businesses to capture a wide range of data types and unlock new opportunities for advanced analytics and artificial intelligence.<\/p>\r\n<p class=\"import-Normal\">Data lakes have certain key characteristics that include:<\/p>\r\n\r\n<ul>\r\n \t<li><em>Scalability<\/em> - They can store enormous amounts of data often in multiples of Petabytes (1 petabyte = 1 followed by 15 zeros)<\/li>\r\n \t<li><em>Flexibility:<\/em> Stores raw data in its original form.<\/li>\r\n \t<li><em>Diversity:<\/em> Accommodates structured, semi-structured, and unstructured data.<\/li>\r\n \t<li><em>Accessibility:<\/em> Provides seamless access for analytics, machine learning, and reporting.<\/li>\r\n<\/ul>\r\n<div style=\"margin: auto\">\r\n<table>\r\n<thead>\r\n<tr style=\"height: 0\">\r\n<th style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\" style=\"text-align: center\"><strong>Featur<\/strong><strong>e<\/strong><\/p>\r\n<\/th>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\" style=\"text-align: center\"><strong>Data Lake<\/strong><\/p>\r\n<\/td>\r\n<td class=\"TableGrid-C\" style=\"vertical-align: middle;border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\" style=\"text-align: center\"><strong>Data Warehouse<\/strong><\/p>\r\n<\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr class=\"TableGrid-R\" style=\"height: 0\">\r\n<th style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\"><strong>Data Type<\/strong><\/p>\r\n<\/th>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\">Structured. Semi-structured and unstructured<\/p>\r\n<\/td>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\">Strictly structured<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr class=\"TableGrid-R\" style=\"height: 0\">\r\n<th style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\"><strong>Schema (model)<\/strong><\/p>\r\n<\/th>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\">Built and deployed during the Analysis phase when data is being read<\/p>\r\n<\/td>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\">Pre-designed and focuses on Storing the data in a specific format and size<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr class=\"TableGrid-R\" style=\"height: 0\">\r\n<th style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\"><strong>Cost<\/strong><\/p>\r\n<\/th>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\">Storing the data is minimal<\/p>\r\n<\/td>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\">Storing the data is comparatively high<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr class=\"TableGrid-R\" style=\"height: 0\">\r\n<th style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\"><strong>Used for <\/strong><\/p>\r\n<\/th>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\">Analytics, Aritificial Intelligence and Machine Learning<\/p>\r\n<\/td>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\">Reporting and Business Intelligence<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr class=\"TableGrid-R\" style=\"height: 0\">\r\n<th style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\"><strong>Scalability<\/strong><\/p>\r\n<\/th>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\">Highly Scalable<\/p>\r\n<\/td>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\">Limited by cost<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr class=\"TableGrid-R\" style=\"height: 0\">\r\n<th style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\"><strong>Data Type<\/strong><\/p>\r\n<\/th>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\">Structured, semi-structured, unstructured<\/p>\r\n<\/td>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\">Structured<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr class=\"TableGrid-R\" style=\"height: 0\">\r\n<th style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\"><strong>Schema<\/strong><\/p>\r\n<\/th>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\">Schema-on-read<\/p>\r\n<\/td>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\">Schema-on-write<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr class=\"TableGrid-R\" style=\"height: 0\">\r\n<th style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\"><strong>Cost<\/strong><\/p>\r\n<\/th>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\">Low-cost storage<\/p>\r\n<\/td>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\">High-cost storage<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr class=\"TableGrid-R\" style=\"height: 0\">\r\n<th style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\"><strong>Purpose<\/strong><\/p>\r\n<\/th>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\">Analytics, AI, and ML<\/p>\r\n<\/td>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\">Reporting and business intelligence<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr class=\"TableGrid-R\" style=\"height: 0\">\r\n<th style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\"><strong>Scalability<\/strong><\/p>\r\n<\/th>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\">Highly scalable<\/p>\r\n<\/td>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-Normal\">Limited scalability<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><\/td>\r\n<td><\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<p class=\"import-Normal\" style=\"text-align: center\"><em>Table 7-2 Comparative Analysis of <\/em><em>F<\/em><em>eatures of Data Lakes Vs. Data Warehouse<\/em><\/p>\r\n\r\n<h3 class=\"import-Normal\"><strong>Data Lakes in Fintech<\/strong><\/h3>\r\n<p class=\"import-Normal\">Data Lakes have become integral to financial technology (FinTech) companies due to their ability to manage and analyze vast, diverse, and rapidly growing datasets. FinTech businesses rely on data lakes to perform:<\/p>\r\n\r\n<ul>\r\n \t<li><em>Fraud Detection and Risk Management<\/em><em> - <\/em>Fraud detection requires sophisticated models that can process massive datasets to identify anomalies and patterns indicative of fraudulent activity. The ability of Data Lakes to handle varying data types is crucial, as fraud signals often come from diverse sources, such as emails, call logs, or social media.<\/li>\r\n \t<li><em>Anomaly Detection<\/em><strong> - <\/strong>Using machine learning algorithms trained on historical data to flag suspicious activities.<\/li>\r\n \t<li><em>Behavioral Analysis-<\/em>\u00a0Tracking spending patterns, device fingerprints, and geolocation data to identify unusual transactions.<\/li>\r\n \t<li><em>Risk Assessment<\/em><em>- <\/em>Combining external and internal data to calculate risk scores for transactions or user accounts<em>.<\/em><\/li>\r\n \t<li><em>Regulatory Compliance and Auditing<\/em><em>- <\/em> Financial institutions face stringent regulations such as GDPR, CCPA, KYC, and anti-money laundering (AML) laws. Data lakes capabilities are foten called upon to perform data retention analysis, audit trails and to perform automated reporting<em>. <\/em><\/li>\r\n \t<li><em>Personalization and Customer Insights<\/em>- FinTech companies use data lakes to develop personalized services, such as providing Customized Financial Products, enabling real-time (Dynamic) Pricing Models for adjusting rates for loans, insurance, or investment products all based on real-time data.<\/li>\r\n \t<li><em>Algorithmic Trading<\/em><em> - <\/em>Algorithmic or automated trading (sometimes called Robo-trade); relies on analyzing vast amounts of market and historical data to make split-second decisions.<\/li>\r\n \t<li><em>Data Aggregation<\/em><em> - <\/em>Collecting market feeds, news articles, and historical price data in real-time.<\/li>\r\n \t<li><em>Back <\/em><em>T<\/em><em>esting<\/em><em> Models<\/em><strong> - <\/strong>Using historical data stored in the lake to evaluate the effectiveness of trading algorithms.<\/li>\r\n \t<li><em>Market Sentiment Analysis<\/em><em> - <\/em>Incorporating alternative datasets, such as social media sentiment, to inform trading strategies.<\/li>\r\n<\/ul>\r\n<div class=\"textbox\">\r\n<div class=\"group\/conversation-turn relative flex w-full min-w-0 flex-col agent-turn\">\r\n<div class=\"flex-col gap-1 md:gap-3\">\r\n<div class=\"flex max-w-full flex-col flex-grow\">\r\n<div class=\"min-h-8 text-message flex w-full flex-col items-end gap-2 whitespace-normal break-words [.text-message+&amp;]:mt-5\" dir=\"auto\" data-message-author-role=\"assistant\" data-message-id=\"5555fc12-a2a4-4540-bc00-a0effb0de35c\" data-message-model-slug=\"gpt-4o\">\r\n<div class=\"flex w-full flex-col gap-1 empty:hidden first:pt-[3px]\">\r\n<div class=\"markdown prose w-full break-words dark:prose-invert light\">\r\n<h3><strong>Licenses and Attribution<\/strong><\/h3>\r\n<h4>CC Licensed Content, Original<\/h4>\r\n<span data-teams=\"true\">This educational material includes AI-generated content from ChatGPT by OpenAI. The original content created by Mohammed Kotaiche from Hillsborough Community College is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (<a id=\"menur5so\" class=\"fui-Link ___1q1shib f2hkw1w f3rmtva f1ewtqcl fyind8e f1k6fduh f1w7gpdv fk6fouc fjoy568 figsok6 f1s184ao f1mk8lai fnbmjn9 f1o700av f13mvf36 f1cmlufx f9n3di6 f1ids18y f1tx3yz7 f1deo86v f1eh06m1 f1iescvh fhgqx19 f1olyrje f1p93eir f1nev41a f1h8hb77 f1lqvz6u f10aw75t fsle3fq f17ae5zn\" title=\"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/deed.en\" href=\"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/deed.en\" rel=\"noreferrer noopener\" aria-label=\"Link CC BY-NC 4.0\">CC BY-NC 4.0<\/a>).\u00a0<\/span>\r\n<div class=\"flex-shrink-0 flex flex-col relative items-end\">\r\n<div>\r\n<div class=\"pt-0\">\r\n<div class=\"gizmo-bot-avatar flex h-8 w-8 items-center justify-center overflow-hidden rounded-full\">\r\n<div class=\"relative p-1 rounded-sm flex items-center justify-center bg-token-main-surface-primary text-token-text-primary h-8 w-8\">All images in this textbook generated with DALL-E are licensed under the terms provided by OpenAI, allowing for their free use, modification, and distribution with appropriate attribution.<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n\r\n<hr \/>\r\n\r\n<h4><strong>CC Licensed Content Included<\/strong><\/h4>\r\n<\/div>\r\n<\/div>\r\n<div class=\"flex w-full flex-col gap-1 empty:hidden first:pt-[3px]\">\r\n<div class=\"markdown prose w-full break-words dark:prose-invert light\">\r\n<div>\r\n<ul>\r\n \t<li style=\"list-style-type: none\">\r\n<ul>\r\n \t<li><strong>ASCII<\/strong>\r\nSource: <a href=\"https:\/\/en.wikipedia.org\/wiki\/ASCII\" target=\"_new\" rel=\"noopener\">Wikipedia<\/a>\r\nLicense: CC BY-SA 3.0<\/li>\r\n<\/ul>\r\n<\/li>\r\n<\/ul>\r\n<ul>\r\n \t<li style=\"list-style-type: none\">\r\n<ul>\r\n \t<li><strong>Federal Reserve Payments Study<\/strong>\r\nSource: <a href=\"https:\/\/www.federalreserve.gov\/paymentsystems\/fr-payments-study.htm\" target=\"_new\" rel=\"noopener\">Federal Reserve<\/a>\r\nLicense: Public Domain<\/li>\r\n \t<li><strong>OLAP 3D Cube<\/strong>\r\nSource: <a href=\"https:\/\/commons.wikimedia.org\/wiki\/File:Olap-3d-cube.png\" target=\"_new\" rel=\"noopener\">Wikimedia Commons<\/a>\r\nLicense: CC BY-SA 4.0<\/li>\r\n \t<li><strong>Frequency of Extreme Weather for Different Degrees of Global Warming<\/strong>\r\nSource: <a href=\"https:\/\/commons.wikimedia.org\/wiki\/File:20211109_Frequency_of_extreme_weather_for_different_degrees_of_global_warming_-_bar_chart_IPCC_AR6_WG1_SPM.svg\" target=\"_new\" rel=\"noopener\">Wikimedia Commons<\/a>\r\nLicense: CC BY-SA 4.0<\/li>\r\n \t<li><strong>Backcrossing Leads to Introgression<\/strong>\r\nSource: <a href=\"https:\/\/commons.wikimedia.org\/wiki\/File:Backcrossing_leads_to_introgression.jpg\" target=\"_new\" rel=\"noopener\">Wikimedia Commons<\/a>\r\nLicense: CC BY-SA 4.0<\/li>\r\n \t<li><strong>National Oceanic and Atmospheric Administration (NOAA)<\/strong>\r\nSource: <a href=\"https:\/\/www.noaa.gov\/\" target=\"_new\" rel=\"noopener\">NOAA<\/a>\r\nLicense: Public Domain<\/li>\r\n \t<li><strong>Data-Driven vs. Data-Brokering<\/strong>\r\nSource: <a href=\"https:\/\/commons.wikimedia.org\/wiki\/File:Data-driven_vs_data-brokering.png\" target=\"_new\" rel=\"noopener\">Wikimedia Commons<\/a>\r\nLicense: CC BY-SA 4.0<\/li>\r\n<\/ul>\r\n<\/li>\r\n<\/ul>\r\n<\/div>\r\n\r\n<hr \/>\r\n\r\n<h4>Other Licensed Content Included<\/h4>\r\n<ul>\r\n \t<li><strong>What is Data Mining and Why is it Important?<\/strong> Source: YouTube. License: Standard YouTube License. Retrieved from\r\n<a href=\"https:\/\/www.youtube.com\/results?search_query=what+is+data+mining+and+why+is+it+important\" target=\"_blank\" rel=\"noopener\">YouTube\u2019s Video on Data Mining<\/a>.<\/li>\r\n \t<li><strong>The Difference Between Unstructured Data and Structured Data<\/strong>. Source: AIIM Blog. License: Copyright \u00a9 2025 Association for Intelligent Information Management. All rights reserved. Retrieved from <a href=\"https:\/\/www.aiim.org\/blog\/difference-between-unstructured-and-structured-data\" target=\"_blank\" rel=\"noopener\">AIIM Blog: Difference Between Unstructured and Structured Data<\/a>.<\/li>\r\n \t<li><strong>Google Search Labs<\/strong>. Source: The Verge. License: Standard Copyright. Retrieved from <a href=\"https:\/\/www.theverge.com\/google-search-labs\" target=\"_blank\" rel=\"noopener\">The Verge\u2019s Coverage of Google Search Labs<\/a>.<\/li>\r\n \t<li><strong>Global Mobile Data Usage Forecast<\/strong>. Source: Statista. License: Standard Copyright. Retrieved from <a href=\"https:\/\/www.statista.com\/statistics\/270749\/global-mobile-data-traffic-forecast\/\" target=\"_blank\" rel=\"noopener\">Statista\u2019s Global Mobile Data Usage Forecast<\/a>.<\/li>\r\n \t<li><strong>Definition of 'Schema'<\/strong>. Source: Oxford English Dictionary. License: Standard Copyright. Retrieved from <a href=\"https:\/\/www.oed.com\/view\/Entry\/172074\" target=\"_blank\" rel=\"noopener\">Oxford English Dictionary\u2019s Definition of Schema<\/a>.<\/li>\r\n \t<li><strong>Internet Screen Time Statistics 2024<\/strong>. Source: Reviews.org. Retrieved from <a href=\"https:\/\/www.reviews.org\/internet-service\/internet-screen-time-statistics\/\" target=\"_blank\" rel=\"noopener\">Reviews.org\u2019s Internet Screen Time Statistics 2024<\/a>.<\/li>\r\n \t<li><strong>Building the Data Warehouse<\/strong>. Author: Bill Inmon. Publisher: Wiley, 1988. License: Standard Copyright. Retrieved from\r\n<a href=\"https:\/\/www.wiley.com\/en-us\/Building+the+Data+Warehouse%2C+4th+Edition-p-9780764599446\" target=\"_blank\" rel=\"noopener\">Wiley\u2019s Page for Building the Data Warehouse<\/a>.<\/li>\r\n \t<li><strong>A Mathematical Theory of Communication<\/strong>. Author: Claude E. Shannon. Source: University of Illinois. Retrieved from\r\n<a href=\"https:\/\/archive.org\/details\/amathematicaltheoryofcommunication\" target=\"_blank\" rel=\"noopener\">University of Illinois\u2019 Archive of A Mathematical Theory of Communication<\/a>.<\/li>\r\n \t<li><strong>How Many Words Does the Average Person Say a Day?<\/strong>. Source: WordsRated. License: Standard Copyright. Retrieved from <a href=\"https:\/\/wordsrated.com\/how-many-words-average-person-says-per-day\/\" target=\"_blank\" rel=\"noopener\">WordsRated\u2019s Study on Average Words Spoken Per Day<\/a>.<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>","rendered":"<h2 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"Opening_Vignette\"><\/span><strong>Opening Vignette<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p class=\"import-Normal\">Napoleon&#8217;s invasion of Russia in 1812, also known as the Russian Campaign or the Patriotic War of 1812, was driven by a combination of geopolitical, economic, strategic, and personal motives. France\u2019s reasons behind the campaign included Napoleon\u2019s desire to enforce the Continental System which, by modern standards, is a form of sanctions aimed at weakening Britain by prohibiting trade between the British Empire and continental Europe.<\/p>\n<figure style=\"width: 1432px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/image1-1.png\" alt=\"Minard's 1869 flow map of Napoleon's invasion of Russia in 1812\u20131813. A very innovative thematic map from the 19th century. Long description located below.\" width=\"1432\" height=\"682\" \/><figcaption class=\"wp-caption-text\">Figure 6-1 1812<a href=\"https:\/\/en.wikipedia.org\/wiki\/Thematic_map\"> Napolean\u2019s March to Moscow by Charles Minard<\/a> \u2013 Public Domain<\/figcaption><\/figure>\n<p class=\"import-Normal\">Tsar Alexander I of Russia had initially agreed to the Continental System in the Treaty of Tilsit (1807). However, Russia\u2019s economy suffered from the trade restrictions, leading Alexander to withdraw from the system in 1810. Napoleon sought to force Russia back into compliance with the Continental System to maintain the economic blockade against Britain, which he considered France&#8217;s greatest rival. Napoleon decides to invade Russia.<\/p>\n<p class=\"import-Normal\">Napolean invaded Russia. He amassed \u201cThe Grand Arm\u00e9e\u201d of more than 400,000 men that started the march on June 24,1812 and moved on foot for more than 1000 miles from their starting point near the Niemen River (in present-day Poland\/Lithuania) towards Moscow. Napolean\u2019s Grand Army fought many battles along the way, mainly the battles of Smolensk (August 16\u201318, 1812) and the battle of Borodino (September 7, 1812) to finally arrive at a burned-out and deserted Moscow. With no one to fight with, to surrender, or to negotiate peace. Facing starvation he decided to retreat.<\/p>\n<p class=\"import-Normal\">During the retreat, the Napoleon\u2019s army traveled the same distance back, but under far worse conditions, often taking longer and more indirect routes.<\/p>\n<h3 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"What_does_the_chart_depict\"><\/span><strong>What does the chart depict<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-Normal\">The illustration depicts Napoleon&#8217;s army departing the Polish-Russian border. A thick band illustrates the size of his army at specific geographic points during their advance and retreat. It displays <strong><em>six types of data in two dimensions<\/em><\/strong>: the number of Napoleon&#8217;s troops; the distance traveled; temperature; latitude and longitude; direction of travel; and location relative to specific dates (Time Scale).<\/p>\n<h3 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"Key_Principle\"><\/span><strong>Key Principle<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-Normal\">The chart in figure 7-1 is considered by many as the earliest depiction of what we call today as Big Data (400,000 troops to account for; A supply chain flow that extended across thousands of physical locations; and a linear progress constrained by weather and topography). Truly a picture is worth more than a thousand words.<\/p>\n<p class=\"import-Normal\">Most businesses operate on very thin margins and attempts at reducing cost is, in many instances, cutting into the proverbial bones. Competitiveness must be reimagined. New and modern methods that support the reimagination process goes far beyond doing \u201cthings\u201d better, faster and cheaper. Could the promise of a crystal ball of Predictive Analytics prevented such a massive and catastrophic defeat of Napoleon? Perhaps.<\/p>\n<div class=\"textbox textbox--learning-objectives\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Learning Objectives<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p class=\"import-Normal\">Upon completion of this chapter, the student should be able to:<\/p>\n<ul>\n<li>Clearly explain the difference between Data and Information<\/li>\n<li>Define Big Data and explain why it is called Big<\/li>\n<li>Define the 7 characteristics of Big Data<\/li>\n<li>Explain the technologies that support uses of Big Data<\/li>\n<li>Explain how Businesses use Big Data as an Intelligence tool<\/li>\n<li>Explain the 2 basic types of Analytics descriptive and predictive<\/li>\n<li>Demonstrate knowledge of basic methods that support Descriptive Analytics<\/li>\n<li>Demonstrate knowledge the basic methods that support predictive analytics<\/li>\n<li>Explain Data Mining<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<p><span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav>\n<ul class='ez-toc-list ez-toc-list-level-1 ' >\n<li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Opening_Vignette\" >Opening Vignette<\/a>\n<ul class='ez-toc-list-level-3' >\n<li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#What_does_the_chart_depict\" >What does the chart depict<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Key_Principle\" >Key Principle<\/a><\/li>\n<\/ul>\n<\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Data_Information_and_Intelligence_in_Big_Data\" >Data, Information and Intelligence in Big Data<\/a>\n<ul class='ez-toc-list-level-3' >\n<li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#DATA_%E2%80%93_Bits_and_Bytes\" >DATA \u2013 Bits and Bytes<\/a><\/li>\n<\/ul>\n<\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Information\" >Information<\/a>\n<ul class='ez-toc-list-level-3' >\n<li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Types_of_Data\" >Types of Data<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#The_%E2%80%9CBig%E2%80%9D_in_Big_Data\" >The \u201cBig\u201d in Big Data<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#_Characteristics_of_Big_Data\" >\u00a0Characteristics of Big Data<\/a>\n<ul class='ez-toc-list-level-4' >\n<li class='ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#1_Volume_are_the_amounts_of_data_being_generated_stored_and_analyzed_402_Million_Terabytes_are_generated_globally_on_daily_basis_90_of_it_is_in_just_the_last_2_years_2023_and_2024\" >1. Volume: are the amounts of data being generated, stored, and analyzed. 402 Million Terabytes are generated globally on daily basis. 90% of it is in just the last 2 years (2023 and 2024)<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#2_Velocity_Is_the_Speed_at_Which_Data_is_Generated_and_Processed\" >2. Velocity: Is the Speed at Which Data is Generated and Processed.<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#3_Variety\" >3. Variety.<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#4_Veracity\" >4. Veracity<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#5_Value\" >5. Value<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#6_Variability_The_Changing_Nature_of_Data\" >6. Variability: The Changing Nature of Data.<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#7_Visualization\" >7. Visualization<\/a><\/li>\n<\/ul>\n<\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-17\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Analytical_Tools_and_Technologies\" >Analytical Tools and Technologies<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-18\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Data_Warehouse_Data_Marts_and_Data_Lakes\" >Data Warehouse, Data Marts and Data Lakes<\/a><\/li>\n<\/ul>\n<\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-19\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Organization_of_a_Data_Warehouse\" >Organization of a Data Warehouse<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-20\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Data_Warehouse_and_Decision_Support\" >Data Warehouse and Decision Support<\/a>\n<ul class='ez-toc-list-level-3' >\n<li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-21\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#a_Data_Consolidation_and_Integration\" >a) Data Consolidation and Integration<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-22\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#b_Historical_Analysis\" >b) Historical Analysis<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-23\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#c_Performance_and_Efficiency\" >c) Performance and Efficiency<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-24\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#d_Quality_and_Consistency\" >d) Quality and Consistency<\/a><\/li>\n<\/ul>\n<\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-25\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#The_Role_of_Data_Warehouse_in_Business_Intelligence_BI\" >The Role of Data Warehouse in Business Intelligence (BI)<\/a>\n<ul class='ez-toc-list-level-3' >\n<li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-26\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Key_Characteristics_of_Data_Mining\" >Key Characteristics of Data Mining<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-27\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Categories_of_Business_Intelligence_and_Analytics\" >Categories of Business Intelligence and Analytics<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-28\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Descriptive_Analytics\" >Descriptive Analytics<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-29\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Predictive_Analysis\" >Predictive Analysis<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-30\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Optimization_Analysis\" >Optimization Analysis<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-31\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Linear_Programming\" >Linear Programming<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-32\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Simulations\" >Simulations<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-33\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Text_and_Video_Analytics\" >Text and Video Analytics<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-34\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Data_Marts\" >Data Marts<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-35\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Data_Lakes\" >Data Lakes<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-36\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Data_Lakes_in_Fintech\" >Data Lakes in Fintech<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-37\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Licenses_and_Attribution\" >Licenses and Attribution<\/a>\n<ul class='ez-toc-list-level-4' >\n<li class='ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-38\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#CC_Licensed_Content_Original\" >CC Licensed Content, Original<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-39\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#CC_Licensed_Content_Included\" >CC Licensed Content Included<\/a><\/li>\n<li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-40\" href=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/chapter\/6\/#Other_Licensed_Content_Included\" >Other Licensed Content Included<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/nav>\n<\/div>\n<h2 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"Data_Information_and_Intelligence_in_Big_Data\"><\/span><strong>Dat<\/strong><strong>a, Information and Intelligence <\/strong><strong>in Big Data <\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<figure style=\"width: 430px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" class=\"\" src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/image2-1.jpeg\" alt=\"Data center with clouds.\" width=\"430\" height=\"430\" \/><figcaption class=\"wp-caption-text\">Image generated by OpenAI\u2019s DALL\u00b7E<\/figcaption><\/figure>\n<h3 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"DATA_%E2%80%93_Bits_and_Bytes\"><\/span><strong>DATA<\/strong><strong> \u2013 Bits and Bytes<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-Normal\"><em>Data is defined as a mere fact, devoid of any context or relative use<\/em>. It is the smallest unit of recognition. A part of a number, a whole number, a part of a letter, a whole letter, a single word, and sometimes even an entire sentence that does not convey (inform) and on its own has very little usefulness. Within traditional computers, the smallest unit of Data is a representation of a bit. <em> A <\/em><em>b<\/em><em>it is a single 0 (Zero) or a 1 (one). <\/em>Zero\u2019s and Ones in computers mean that electricity IS EITHER flowing inside the computer or it is NOT. Imagine a light bulb for instance, the wall switch controls the state of the bulb with Zero indicating the bulb is OFF, or a One, indicating the bulb is ON. Data is Binary. In contrast with the decimal system containing the set of digits {0 to 9}, binary refers to a numbering system that uses only two digits: 0 and 1. A combinations thereof, constitute a single alphabet (character), a symbol (such as equal sign, greater than, brackets, period, etc.) or perhaps a part of an image on your screen. Imagine if a pixel on your screen is missing (OFF), your image would then look as if you had a hole in it. In natural languages such as English, its words are composed of characters. Computer words are also composed of characters. A character in computers, however, is always the\u00a0same number of bits and depends on the type, and version of an Operating System being used. In the Windows-based system (as of 2024) version 11, each character of the English language is represented by a unique set of 64 Zero\u2019s and 1\u2019s (bits).<\/p>\n<p class=\"import-Normal\">As we go about our daily lives, we often hear the term Byte. A <em>Byte<\/em> is a part of a character, symbol or an image, except, by convention it is a composition of 8 bits.<\/p>\n<figure style=\"width: 405px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" class=\"\" style=\"font-family: 'Sorts Mill Goudy', 'Times New Roman', serif;font-size: 16px\" src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/image3-2.png\" alt=\"Open book.\" width=\"405\" height=\"405\" \/><figcaption class=\"wp-caption-text\">Image generated by OpenAI\u2019s DALL\u00b7E<\/figcaption><\/figure>\n<h2 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"Information\"><\/span><strong>Information<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p class=\"import-Normal\"><em>Information is a grouping of <\/em><em>bytes of <\/em><em>data that has context<\/em> <em>and is useful. <\/em>Just as a book convey meaning (Context). Context develops when Data is grouped together in a grammatically correct manner (structured) and is processed (analyzed for value) and provides relevance.<\/p>\n<h3 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"Types_of_Data\"><\/span><strong>Types of Data<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-Normal\">Data have been organized by conventions. For example, the American Standard for Information Interchange (ASCII, pronounced asskee), is a standard that was developed in the early 1960\u2019s to encode the English language character set into a language that computers can understand (i.e., zeros and Ones) and lists, in a table, the digital equivalent of the alphabet. Overtime, this coding standard expanded to include the various other types we know today. The two basic types of Data are:<\/p>\n<p class=\"import-Normal\"><em>Structured Types<\/em><em>:<\/em> Structured types of Data are sets that follow a standard size (each data element has the same number of Zero\u2019s and Ones) and includes:<\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li><em>Character Type<\/em>: Assigns a unique string of O\u2019s and 1\u2019s, of same size, to each alphabet such as A (capital A), Vs. a (small case a)<\/li>\n<li><em>Numeric Type<\/em>: that includes the decimal (o\u2026.9)<\/li>\n<li><em>Symbols Type<\/em>: That includes all symbols found on a computer\u2019s keyboard<\/li>\n<li><em>Special Symbols Type<\/em>: That includes symbols for Math, Currency, Scientific and other uses<\/li>\n<li class=\"import-Normal\">Structured data fits neatly into a table and is easy to store and analyze by human and machines alike.<\/li>\n<\/ul>\n<p class=\"import-Normal\"><em>Unstructured Data Types (AIIM Blog, 2025):<\/em> Unstructured Data Types are data elements that are not easy to store, understand or analyze by human or machines. This Data type does not follow a convention (standard size for example) and includes:<\/p>\n<ul>\n<li><em>Text documents:\u00a0<\/em>Any text file like Word documents, emails, blog posts, survey responses, where the information isn&#8217;t organized in a predefined structure<em>.\u00a0<\/em><\/li>\n<li><em>Images:\u00a0<\/em>Photographs, scanned documents, paintings &#8211; visual data without inherent organization<\/li>\n<li><em>Audio files:\u00a0<\/em>Music, voice recordings, podcasts &#8211; sound data that can&#8217;t be easily categorized with structured fields<em>\u00a0<\/em><\/li>\n<li><em>Video files:\u00a0<\/em>Movies, surveillance footage, recorded presentations &#8211; moving images with no predefined data structure<em>\u00a0<\/em><\/li>\n<li><em>Social media posts:\u00a0<\/em>Tweets, Facebook updates, Instagram captions &#8211; text-based interactions with varying formats and content<em>\u00a0<\/em><\/li>\n<li><em>Handwritten notes:\u00a0<\/em>Notes written on paper, where the information is not digitally structured<\/li>\n<\/ul>\n<h3><span class=\"ez-toc-section\" id=\"The_%E2%80%9CBig%E2%80%9D_in_Big_Data\"><\/span>The \u201cBig\u201d in Big Data<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-Normal\">How much is too much? and where is it all coming from? may be questions we ponder. As we go about our daily lives, we generate process and consume tremendous amounts of data. According to current estimates,\u00a0humanity in the 21<sup>st<\/sup> century is producing\u00a0more data in a single day\u00a0than was ever produced in the entirety of human history up until the early 2000s, meaning the vast majority of data ever created is generated within the last couple of decades, with a significant portion created daily. To list but a few of the activities behind generating Data:<\/p>\n<ul>\n<li>Your teacher may ask you to write a 500-word paper on the topic of BIG Data. The average Word document with plain text (meaning no images, embedded HTML, or Videos, etc.) of single spaced, 500 English words document is around 65,000 bits. The average 500 words document with 3 images is around 1 million bits. An average textbook with 300 pages and 500 images is around 1.5 billion bits, or 187,500,000 Bytes.<\/li>\n<li>On average a person speaks approximately 30,000 words per day and a Princeton study calculated it to be around 40 million bits, or 5,000,000 Bytes.<\/li>\n<li>In the US, adults spend around 3 hours per day watching TV. All TV signals in the US have converted over to digital (i.e., bits) around 2010. Each hour of TV signals carries around 54 billion bits (in 4K HD), or 6,750,000,000 Bytes.<\/li>\n<li>On average Gen-Z interacts with the internet around 6 hrs. per day generating approximately 32 billion bits or around 4 Billion Bytes (Reviews.org, 2024).<\/li>\n<li>Just in the US, for calendar year 2022, general-purpose cards (Visa, Master Card, American Express, etc.) payments reached a record 153.3 billion transactions and $9.76 trillion in value. Note, these numbers didn\u2019t account for Private Label card (merchant-branded), Electronic Benefits Cards, Debit Cards, Gift Cards, or any other payment means such as Government Purchasing cards.<\/li>\n<\/ul>\n<h3 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"_Characteristics_of_Big_Data\"><\/span><strong>\u00a0Characteristics of Big Data<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-Normal\">Big Data is a term that describes large sets of organized data that are so enormous in size and complexity that requires methods that go beyond the traditional Data Processing of hardware, software and analysis tools for understanding and management.<\/p>\n<p class=\"import-Normal\">There are 7 key characteristics that describe Big Data:<\/p>\n<h4 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"1_Volume_are_the_amounts_of_data_being_generated_stored_and_analyzed_402_Million_Terabytes_are_generated_globally_on_daily_basis_90_of_it_is_in_just_the_last_2_years_2023_and_2024\"><\/span>1. <em>Volume:<\/em> are the amounts of data being generated, stored, and analyzed. 402 Million Terabytes are generated globally on daily basis. 90% of it is in just the last 2 years (2023 and 2024)<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<h4><span class=\"ez-toc-section\" id=\"2_Velocity_Is_the_Speed_at_Which_Data_is_Generated_and_Processed\"><\/span>2. <em>Velocity<\/em><em>: <\/em><em>Is t<\/em><em>he Speed at Which Data is Generated and Processed<\/em><em>. <\/em><span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>Velocity in the context of Big Data refers to the speed at which data is generated, collected, and processed. This characteristic is particularly important because the value of data often diminish rapidly if it is not processed and acted upon in real time or near-real time (Think of a stock price shown at the moment of opening \/ closing bell, or its constant movement throughout a trading day). Key aspects of Velocity include:<\/p>\n<p class=\"import-NormalWeb\"><strong class=\"import-Strong\"><em>Real-time and Streaming Data<\/em><\/strong><strong><em>:<\/em><\/strong> Many Big Data applications rely on real-time data processing. Examples include social media platforms, financial trading, and embedded IoT sensor devices. In these cases, data is generated continuously and often needs to be processed instantly to extract value (e.g., fraud detection, or live decisions to buy\/sell recommendations).<\/p>\n<p class=\"import-NormalWeb\"><strong class=\"import-Strong\"><em>High-frequency Data<\/em><\/strong><strong><em>:<\/em><\/strong> Certain systems, like stock markets, sensor networks, or GPS-enabled devices, generate high-frequency data that needs to be analyzed almost as soon as it\u2019s created. These applications often require specialized technologies like stream processing to handle large volumes of rapidly flowing data.<\/p>\n<p class=\"import-NormalWeb\"><strong class=\"import-Strong\"><em>Timeliness and Latency<\/em><\/strong><strong><em>:<\/em><\/strong> Low latency (the delay between data generation and processing) is critical in situations such as autonomous vehicles, online gaming, or emergency response systems, where decisions need to be made instantly. The need for processing speed can impact how the infrastructure and algorithms are designed.<\/p>\n<p class=\"import-NormalWeb\"><strong class=\"import-Strong\">Data at Scale<\/strong><strong>:<\/strong> Not only does velocity deal with fast data generation, but also with the sheer scale at which it arrives. Consider the enormous amount of data produced by devices like smartphones, wearable health trackers, social media platforms, Home surveillance cameras, and Alexa and Siri-like devices, which all need to be processed continuously. Examples of Velocity in action includes:<\/p>\n<p class=\"import-NormalWeb\">Transactional Data on eCommerce websites: Recommendation systems generate real-time suggestions that are based on a user\u2019s browsing and purchasing behavior, requiring fast data processing.<\/p>\n<p class=\"import-NormalWeb\">Stock Markets: Financial institutions analyze stock tickers and market data in real time to make split-second decisions.<\/p>\n<p class=\"import-NormalWeb\">Social Media: Data is continuously generated by billions of users posting updates, comments, likes &amp; shares, and multimedia content. This data is processed quickly to serve included advertisements or to identify trending <em>topics.<\/em><\/p>\n<h4><span class=\"ez-toc-section\" id=\"3_Variety\"><\/span><em>3. <\/em><em>Variety<\/em><em>.<\/em><span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>Variety refers to the different types and formats of data that need to be processed and integrated in Big Data applications. In traditional databases, data is typically structured and stored in a tabular format (e.g., rows and columns), but Big Data often comes in diverse forms, such as text, images, videos, and sensor data. Key aspects of Variety include:<\/p>\n<p class=\"import-Normal\">Structured Data: This is highly organized data that is easily searchable in databases (e.g., customer names, addresses, transaction records). While structured data remains important, it makes up only a small fraction of the data being generated today.<\/p>\n<p class=\"import-Normal\">Semi-structured Data: Data that doesn\u2019t fit neatly into a table but still contains some structure, often through hash tags or markers. Examples include XML and JSON files used for inter-website communication. Semi-structured data is increasingly common as companies integrate data from multiple sources.<\/p>\n<p class=\"import-Normal\">Unstructured Data: This is data without a predefined structure, making it more difficult to store and analyze. Unstructured data includes text files, emails, social media posts, video and audio files, images, and documents. A significant portion of Big Data is unstructured, and new tools are being developed to extract value from this type of data.<\/p>\n<p class=\"import-Normal\">Multimedia Data: This includes images, video, and audio, which require advanced technologies and processing like image recognition, facial recognition, speech-to-text, or video processing for analysis. For example, security systems might process video footage in real time to identify threats.<\/p>\n<p class=\"import-Normal\">Sensor Data (IoT Data): The Internet of Things (IoT) is generating vast amounts of data from connected devices (smartphones, smart home devices, wearables, etc.), which can be structured (e.g., sensor readings) or unstructured (e.g., sound or image data).<\/p>\n<p class=\"import-Normal\">Machine-Generated Data: This data is produced by machines and systems automatically, without human intervention. Examples include logs, transaction data, or sensor data from industrial equipment. This data often needs to be integrated with other types of data for meaningful analysis.<\/p>\n<p class=\"import-Normal\">Metadata: Data about the data, such as when and where it was created, who created it, and how it is related to other data. Metadata helps organize and interpret the content of various data formats. Examples of Variety in action include:<\/p>\n<p class=\"import-Normal\">Social Media Platforms: A platform like Meta (Facebook) or X (Twitter) has to process various forms of data, including text posts, images, videos, comments, and reactions.<\/p>\n<p class=\"import-Normal\">Healthcare Industry: Healthcare data includes structured patient records (e.g., diagnosis, treatment) and unstructured data (e.g., doctor&#8217;s notes, radiology images, medical scans).<\/p>\n<p class=\"import-Normal\">Self-Driving Cars: These vehicles generate a combination of structured data (e.g., GPS data, speed, and temperature) and unstructured data (e.g., video from cameras).<\/p>\n<h4><span class=\"ez-toc-section\" id=\"4_Veracity\"><\/span><em>4. <\/em><em>Veracity<\/em><span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p><em>Veracity is defined as <\/em><em>The Trustworthiness and Quality of Data<\/em><em>.<\/em> It refers to the uncertainty or reliability of the data. In the world of Big Data, not all data is clean, accurate, or reliable. Inaccurate, incomplete, or inconsistent data can lead to poor insights and flawed decision-making. An examples of Veracity would be Social Media posts, tweets, and streamed News Media, that may be inaccurate, biased, and in some instances misleading and \u201cmanufactured\u201d.\u00a0 Ensuring data veracity involves fact-checking and determining the credibility of the source. Other examples of the necessity for Veracity could appear in:<\/p>\n<ul>\n<li class=\"import-Normal\">Medical or Scientific experiments<\/li>\n<li class=\"import-Normal\">Voting Systems<\/li>\n<li class=\"import-Normal\">Electronic Health Records<\/li>\n<li class=\"import-Normal\">Financial Transactions<\/li>\n<li class=\"import-Normal\">Student Records<\/li>\n<\/ul>\n<h4><span class=\"ez-toc-section\" id=\"5_Value\"><\/span><em>5. <\/em><strong class=\"import-Strong\"><em>Value<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p><em>Value is defined as <\/em><em>The Utility and Insight of Data<\/em><em>. <\/em><strong class=\"import-Strong\">Value<\/strong> is about <strong class=\"import-Strong\">extracting useful insights<\/strong> from Big Data and is truly transformative across various industries. By leveraging data analytics, organizations in retail, healthcare, finance, manufacturing, agriculture, transportation, telecommunications, energy, and education can optimize operations, enhance customer experiences, reduce costs, and make more informed decisions. Value answers questions like:<\/p>\n<ul>\n<li>Who is my best customer?<\/li>\n<li>What is the price my customers would be willing to pay for item X?<\/li>\n<li>What is today\u2019s best delivery route for my truck drivers?<\/li>\n<li>Which optimal treatment should I follow to treat type X cancer patients?<\/li>\n<li>Which curriculum should I develop to optimize student success?<\/li>\n<li>And a million other like questions\u2026\u2026<\/li>\n<\/ul>\n<p class=\"import-Normal\">As important as the other characteristics of Big Data are, Value is by far front and center of the topic of Big Data Analytics.<\/p>\n<h4 class=\"__UNKNOWN__\"><span class=\"ez-toc-section\" id=\"6_Variability_The_Changing_Nature_of_Data\"><\/span><em>6. <\/em><strong><em>Variability<\/em><em>: The Changing Nature of Data<\/em><em>. <\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p class=\"import-NormalWeb\"><strong class=\"import-Strong\">Variability<\/strong> refers to the <strong class=\"import-Strong\">inconsistent<\/strong> or <strong class=\"import-Strong\">dynamic nature<\/strong> of data. Unlike structured data, which is relatively stable and predictable, Big Data can be highly variable. The data patterns, formats, and sources may change over time, requiring systems to adjust accordingly. Data Variability affects every known sector of the economy. Few examples of Data Variability include:<\/p>\n<p class=\"import-NormalWeb\">Healthcare \u2013 A patient\u2019s record at a Physician\u2019s practice may have to connect to a Hospital\u2019s Electronic Records System made by different software company.<\/p>\n<p class=\"import-NormalWeb\">Utilities and Power Generation \u2013 Solar energy production is highly unpredictable because of conditions such as weather, wind and changes of the sun\u2019s location throughout the day. The same fluctuations also impact energy consumption.<\/p>\n<p class=\"import-NormalWeb\">Agriculture \u2013 Seasonal change, Rain Forecasts, Weather Conditions, Soil Conditions and a host of other variables impact decisions of what to grow, when to grow, when harvest, and where and to whom to sell farm products.<\/p>\n<p class=\"import-NormalWeb\">Manufacturing \u2013 Sourcing of raw materials and changing supply chains, Inventory of raw and finished goods, forecasting demand, labor allocation, and a host of other related variables.<\/p>\n<p class=\"import-NormalWeb\">Financial Services \u2013 In finance, data variability is often encountered due to fluctuating market conditions, changing customer behaviors, and inconsistencies in data sources. Examples of variability are prevalent in <strong class=\"import-Strong\">Stock Market Data, Customer Transaction history, and financial regulations and reporting.<\/strong><\/p>\n<h4><span class=\"ez-toc-section\" id=\"7_Visualization\"><\/span>7. <strong class=\"import-Strong\"><em>Visualization<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p><em>Visualization <\/em><em>is defined<\/em> <em>as <\/em><em>the representation of an object, situation, or set of information as a chart or other image<\/em>.<\/p>\n<p><strong class=\"import-Strong\">Visualization<\/strong> focuses on <strong class=\"import-Strong\">presenting Big Data<\/strong> in a way that makes it easier to understand and act upon (see Napoleon\u2019s Russia Campaign). Given the vast amount of data and complexity involved, effective visualization helps stakeholders make sense of patterns, trends, and insights that enable accurate prediction of outcomes.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Analytical_Tools_and_Technologies\"><\/span><strong>Analytical<\/strong> <strong>Tools and Technologies<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-Normal\">As we have previously defined, Big Data are enormous sets of complex data sets that traditional data management methods of software, hardware and analysis processes are powerless in dealing with them. Facing that reality, technology development have answered this challenge and what emerged are truly amazing, simple and easy to use tools. In this section we will explore some of these transformative technologies.<\/p>\n<h3 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"Data_Warehouse_Data_Marts_and_Data_Lakes\"><\/span>Data Warehouse, Data Marts and Data Lakes<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<figure style=\"width: 344px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" class=\"\" src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/image6-1.png\" alt=\"Data warehouse.\" width=\"344\" height=\"344\" \/><figcaption class=\"wp-caption-text\">Figure 6-2 Image of a Data Warehouse \u2013\u00a0 Generated with Google AI<\/figcaption><\/figure>\n<p>Data in its unprocessed form is known as <em>raw data<\/em>. Making sound business decisions on raw data is highly inaccurate. Data has to be \u201cpre-processed\u201d, cleaned and filtered, and organized in such a manner to facilitate easy access, retrieval and follow-on processing. Traditionally, this is what was termed as a Transaction Processing System (TPS). TPS captures the data from its source through daily execution of normal activities. In a Merchant\u2019s system, starting at the cash register, the cash register collects transaction data (receipt of every item sold, retuned, or exchanged) by a customer. The merchant\u2019s back-end system will pre-process this data, ensuring its accuracy, and time stamp every receipt then forward these receipts to the perspective payment processor who in turn validates accuracy, availability of funds and performs a settlement (withdraw money from Buyer\u2019s bank and deposit the required amounts into merchant\u2019s bank account).<\/p>\n<p>As also mentioned earlier, buy\/sell transactions are small data that can be very quickly processed. However, for super large merchants, transactions are not limited to just buy\/sell. The merchant may have a huge supply chain, 10\u2019s of thousands of physical locations, thousands of suppliers, and millions of customers interacting on daily basis, coupled with their urgent need to continue to be competitive and profitable.<\/p>\n<p class=\"import-Normal\">A data warehouse is a centralized database (a software-based storage mechanism of data) that holds structured data in the form of records of transactions from many sources (Inmon, 1988). Customers, Suppliers, Banks, Government, payment processors, Gateway providers, Automated Clearing Houses, Products, Prices, Marketing Materials, and thousands of other pieces of information necessary to \u201cmanage\u201d the business. A key purpose of a data warehouse is to provide a facility for querying (asking) the data to reveal specific information. (e.g., who is my best customer?). Key characteristics of a Data Warehouse include:<\/p>\n<p class=\"import-Normal\"><strong class=\"import-Strong\"><em>Data is <\/em><\/strong><strong class=\"import-Strong\"><em>Subject-Oriented<\/em><\/strong><em>:<\/em> Data in a warehouse is organized around key business subjects such as sales data, finance data, customer data, supplier data, product data, etc.<\/p>\n<p class=\"import-Normal\"><strong class=\"import-Strong\"><em>Data is Integrated<\/em><\/strong><em>:<\/em> The Data Warehouse consolidates data from various disparate sources (e.g., operational databases, flat files, cloud sources data, marketing &amp; sales data, etc.), ensuring consistency in formats, units of measurement, and coding systems.<\/p>\n<p class=\"import-Normal\"><strong class=\"import-Strong\"><em>Data for <\/em><\/strong><strong class=\"import-Strong\"><em>Time-Varian<\/em><\/strong><strong class=\"import-Strong\"><em>ce<\/em><\/strong><em>:<\/em> The Data Warehouse stores historical data, allowing organizations to analyze trends and performance over different periods, often spanning months or years (e.g., how do 4<sup>th<\/sup> quarter sales of this year compare to last year\u2019s?)<\/p>\n<p class=\"import-Normal\"><strong class=\"import-Strong\"><em>Data is <\/em><\/strong><strong class=\"import-Strong\"><em>Non-Volatile<\/em><\/strong><em>:<\/em> Once data is entered into the warehouse, it is rarely changed or deleted, ensuring a stable and consistent historical record.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Organization_of_a_Data_Warehouse\"><\/span><strong class=\"import-Strong\">Organization of a Data Warehouse<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p class=\"import-NormalWeb\">The organization of a data warehouse involves several layers that facilitate the flow of data from operational systems to analytical systems. These layers are crucial for structuring the data warehouse in a way that optimizes both storage and querying capabilities.<\/p>\n<p><strong class=\"import-Strong\"><em>Data Sources<\/em><\/strong><strong class=\"import-Strong\"><em>&#8211; <\/em><\/strong>Data warehouses typically draw data from a variety of <strong class=\"import-Strong\">internal and external sources<\/strong><strong>,<\/strong> including operational databases (e.g., sales or inventory systems), customer data, external market data, and sensor data. These sources are often heterogeneous, meaning they are in different formats and may be stored in different locations.<\/p>\n<p><strong class=\"import-Strong\"><em>ETL Process (Extract, <\/em><\/strong><strong class=\"import-Strong\"><em>Transform, <\/em><\/strong><strong class=\"import-Strong\"><em>and <\/em><\/strong><strong class=\"import-Strong\"><em>Load)<\/em><\/strong><strong class=\"import-Strong\"><em>&#8211; <\/em><\/strong>The <strong class=\"import-Strong\">ETL process<\/strong> is the core of data integration in a data warehouse. It involves:<\/p>\n<p class=\"import-Normal\"><strong class=\"import-Strong\">Extracting<\/strong> data from various sources.<\/p>\n<p class=\"import-Normal\"><strong class=\"import-Strong\">Transforming<\/strong> the data into a consistent format (e.g., converting data types, cleaning data, standardizing measurements).<\/p>\n<p class=\"import-Normal\"><strong class=\"import-Strong\">Loading<\/strong> the transformed data into the data warehouse, typically into <strong class=\"import-Strong\"><em>fact tables<\/em><\/strong> (which contain quantitative data) and <strong class=\"import-Strong\"><em>dimension tables<\/em><\/strong> (which contain descriptive attributes).<\/p>\n<p><strong class=\"import-Strong\"><em>Data Warehouse Schema: <\/em><\/strong>Once the data is loaded, it is typically organized into schemas (designs) that define how data is stored and accessed:<\/p>\n<p class=\"import-Normal\"><strong class=\"import-Strong\"><em>Star Schema<\/em><\/strong><strong><em>:<\/em><\/strong> The most common schema in data warehouses. It consists of a central <strong class=\"import-Strong\"><em>fact table<\/em><\/strong> (storing numerical data, such as sales or revenue) and surrounding <strong class=\"import-Strong\"><em>dimension tables<\/em><\/strong> (storing descriptive data, such as customer or product information)(Oxford English Dictionary, 2025).<\/p>\n<p class=\"import-Normal\"><strong class=\"import-Strong\"><em>Snowflake Schema<\/em><\/strong><strong><em>:<\/em><\/strong> A more normalized version of the star schema, where dimension tables are further divided into additional tables to reduce data redundancy.<\/p>\n<p><strong class=\"import-Strong\"><em>OLAP (Online Analytical Processing)<\/em><\/strong><strong class=\"import-Strong\"> &#8211; <\/strong>Data Warehouses are designed for <strong class=\"import-Strong\">OLAP<\/strong>, which involves multidimensional analysis. <strong class=\"import-Strong\"><em>OLAP cubes<\/em><\/strong> allow users to view and analyze data from different perspectives (e.g., time, geography, product categories, etc). This enables fast querying and analysis of large datasets.<\/p>\n<h2 style=\"text-align: left\"><span class=\"ez-toc-section\" id=\"Data_Warehouse_and_Decision_Support\"><\/span><strong class=\"import-Strong\">Data Warehouse<\/strong><strong class=\"import-Strong\"> and Decision Support<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<figure id=\"attachment_703\" aria-describedby=\"caption-attachment-703\" style=\"width: 300px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-703 size-medium\" src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/OLAP_Cube.svg_-300x260.png\" alt=\"OLAP cube.\" width=\"300\" height=\"260\" srcset=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/OLAP_Cube.svg_-300x260.png 300w, https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/OLAP_Cube.svg_-768x665.png 768w, https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/OLAP_Cube.svg_-65x56.png 65w, https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/OLAP_Cube.svg_-225x195.png 225w, https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/OLAP_Cube.svg_-350x303.png 350w, https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/OLAP_Cube.svg_.png 1024w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><figcaption id=\"caption-attachment-703\" class=\"wp-caption-text\"><a href=\"https:\/\/en.wikipedia.org\/wiki\/OLAP_cube#\/media\/File:OLAP_Cube.svg\">An example of an OLAP cube, Konrad Roeder derivative work: Rehua (talk)<\/a> <a href=\"https:\/\/creativecommons.org\/licenses\/by-sa\/3.0\">CC BY-SA 3.0<\/a><\/figcaption><\/figure>\n<p class=\"import-NormalWeb\">The primary purpose of a data warehouse is to <strong class=\"import-Strong\">support decision-making processes<\/strong> by providing a consolidated, reliable source of historical data. Data warehouses help businesses by:<\/p>\n<h3><span class=\"ez-toc-section\" id=\"a_Data_Consolidation_and_Integration\"><\/span><strong class=\"import-Strong\"><em>a) Data Consolidation and Integration<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-NormalWeb\">A data warehouse enables businesses to consolidate data from multiple systems, ensuring that all relevant data is available in one place for analysis. This integration helps organizations break down silos and gain a comprehensive view of their operations.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"b_Historical_Analysis\"><\/span><strong class=\"import-Strong\"><em>b) Historical Analysis<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-NormalWeb\">By storing large volumes of historical data, a data warehouse provides businesses with the ability to perform trend analysis, forecast future performance, and monitor changes over time. This capability is critical for long-term strategic planning and decision-making.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"c_Performance_and_Efficiency\"><\/span><strong class=\"import-Strong\"><em>c) Performance and Efficiency<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-NormalWeb\">Since data warehouses are optimized for analytical queries, they allow for faster and more efficient reporting and data retrieval compared to operational databases. This reduces the burden on transactional systems, improving their performance.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"d_Quality_and_Consistency\"><\/span><strong class=\"import-Strong\"><em>d) Quality and Consistency<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-NormalWeb\">The ETL process ensures that the data stored in a data warehouse is clean, standardized, and consistent. This improves the quality of data analysis and ensures that decision-makers are working with reliable information.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"The_Role_of_Data_Warehouse_in_Business_Intelligence_BI\"><\/span><strong class=\"import-Strong\">The Role <\/strong><strong class=\"import-Strong\">o<\/strong><strong class=\"import-Strong\">f Data Warehouse<\/strong><strong class=\"import-Strong\"> in Business Intelligence<\/strong><strong class=\"import-Strong\"> (BI)<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p class=\"import-NormalWeb\">Data Warehouses play a central role in <strong class=\"import-Strong\"><em>Business Intelligence (BI)<\/em><\/strong><em>,<\/em> which refers to the process of using data analysis tools and techniques to make informed business decisions. BI involves analyzing past performance to predict future outcomes, identify trends, and gain actionable insights. Data warehouses uses in BI include:<\/p>\n<ol>\n<li class=\"import-NormalWeb\"><strong class=\"import-Strong\"><em>Reporting and Dashboards<\/em><\/strong><strong class=\"import-Strong\"><em> &#8211; <\/em><\/strong>Data warehouses support the creation of reports and dashboards that summarize business performance. Business analysts and decision-makers can generate reports from the warehouse, providing insights into key performance indicators (KPIs), sales trends, financial performance, and customer behavior.<\/li>\n<li class=\"import-NormalWeb\"><strong class=\"import-Strong\"><em>Ad-Hoc Analysis<\/em><\/strong><strong class=\"import-Strong\"><em> &#8211;<\/em><\/strong> BI tools allow users to perform <strong class=\"import-Strong\">ad-hoc analysis<\/strong> on data from the warehouse. This means that business users can create their own queries or reports based on specific needs without relying on IT teams. The flexible querying capabilities of data warehouses make this possible by enabling users to quickly generate insights from large datasets.<\/li>\n<li class=\"import-NormalWeb\"><strong class=\"import-Strong\"><em>\u00a0Decision Support<\/em><\/strong><strong class=\"import-Strong\"><em> &#8211; <\/em><\/strong>Data Warehouses provide businesses with the ability to make data-driven decisions. Whether it\u2019s analyzing sales data to optimize inventory, understanding customer behavior to improve marketing strategies, or evaluating financial performance to guide investment decisions, the insights derived from a data warehouse are used to drive strategic decisions at every level of the organization.<\/li>\n<li class=\"import-NormalWeb\"><strong class=\"import-Strong\"><em>Business Analytics<\/em><\/strong><strong class=\"import-Strong\"><em> &#8211; <\/em><\/strong>With the integration of advanced analytics tools, organizations can use data warehouses for more sophisticated analyses such as forecasting, trend analysis, and scenario modeling. By combining historical data stored in the warehouse with current data and predictive models, businesses can develop insights that guide future strategies.<\/li>\n<li class=\"import-Normal\"><em><strong>Data Mining<\/strong> &#8211;<\/em> <em>is an integral process of BI. It is the process of analyzing large datasets to uncover hidden patterns, correlations, trends, and relationships within the data<\/em>. It combines aspects of statistics, machine learning, and artificial intelligence (AI) to extract meaningful information and predict future outcomes (Eye on Tech, 2021).<\/li>\n<\/ol>\n<p><iframe loading=\"lazy\" id=\"oembed-1\" title=\"What is Data Mining and Why is it Important?\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/mOfPG5ZIY-k?feature=oembed&#38;rel=0\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/p>\n<h3 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"Key_Characteristics_of_Data_Mining\"><\/span><strong>Key Characteristics of Data Mining<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-Normal\">As a process, Data Mining is characterized by<\/p>\n<ul>\n<li class=\"import-Normal\"><strong><em>Pattern Discovery<\/em><\/strong><em>:<\/em> Data mining helps identify patterns, which could be used to predict future behavior or trends.<\/li>\n<li class=\"import-Normal\"><strong><em>Classification and Clustering<\/em><\/strong><em>:<\/em> It groups data into categories (classification) or finds natural groupings (clustering).<\/li>\n<li class=\"import-Normal\"><strong><em>Prediction<\/em><\/strong><em>:<\/em> Using historical data to predict future events or behaviors.<\/li>\n<li class=\"import-Normal\"><em><strong>Associatio<\/strong>n<\/em><em>:<\/em> It identifies associations and relationships between variables (e.g., market basket analysis).<\/li>\n<li class=\"import-Normal\"><strong><em>Anomaly Detection<\/em><\/strong>: It detects outliers or unusual patterns that might indicate fraud or other exceptional circumstances.<\/li>\n<\/ul>\n<h3 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"Categories_of_Business_Intelligence_and_Analytics\"><\/span><strong>Categories of Business Intelligence and Analytics<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-Normal\">There are several techniques used by Financial Technology companies to advance their penetration into the Fintech market. They include Descriptive Analytics, Predictive Analytics, Optimization, Simulation and Text, Image and Video Analysis.<\/p>\n<div style=\"margin: auto\">\n<table style=\"width: 527pt\">\n<caption><strong>General Categories of BI\/Analytics Techniques<\/strong><\/caption>\n<thead>\n<tr style=\"height: 15pt\">\n<th style=\"background-color: #9bbb59;border-width: 0pt 0.5pt 0.5pt 1pt;border-style: none solid solid;border-color: windowtext;padding: 0pt 5.4pt;vertical-align: bottom;width: 111.8px\">\n<p class=\"import-Normal\" style=\"text-align: center\"><strong>Descriptive<\/strong><\/p>\n<\/th>\n<td class=\"TableNormal-C\" style=\"background-color: #f2dcdb;vertical-align: bottom;border-width: 0pt 0.5pt 0.5pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 110.7px\">\n<p class=\"import-Normal\" style=\"text-align: center\"><strong>Predictive <\/strong><\/p>\n<\/td>\n<td class=\"TableNormal-C\" style=\"background-color: #ffff00;vertical-align: bottom;border-width: 0pt 0.5pt 0.5pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 119.1px\">\n<p class=\"import-Normal\" style=\"text-align: center\"><strong>O<\/strong><strong>p<\/strong><strong>t<\/strong><strong>i<\/strong><strong>mization<\/strong><\/p>\n<\/td>\n<td class=\"TableNormal-C\" style=\"background-color: #e26b0a;vertical-align: bottom;border-width: 0pt 0.5pt 0.5pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 131.562px\">\n<p class=\"import-Normal\" style=\"text-align: center\"><strong>Simulation<\/strong><\/p>\n<\/td>\n<td class=\"TableNormal-C\" style=\"background-color: #66ffff;vertical-align: bottom;border-width: 0pt 1pt 0.5pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 152.7px\">\n<p class=\"import-Normal\" style=\"text-align: center\"><strong>Text &amp; Video Analysis<\/strong><\/p>\n<\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr class=\"TableNormal-R\" style=\"height: 30pt\">\n<th style=\"background-color: #e5fed8;border-width: 0pt 0.5pt 0.5pt 1pt;border-style: none solid solid;border-color: windowtext;padding: 0pt 5.4pt;vertical-align: bottom;width: 111.8px\">\n<p class=\"import-Normal\"><strong>Visual Analysis<\/strong><\/p>\n<\/th>\n<td class=\"TableNormal-C\" style=\"background-color: #e5fed8;vertical-align: bottom;border-width: 0pt 0.5pt 0.5pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 110.7px\">\n<p class=\"import-Normal\"><strong>Time Series Analysis<\/strong><\/p>\n<\/td>\n<td class=\"TableNormal-C\" style=\"background-color: #e5fed8;vertical-align: bottom;border-width: 0pt 0.5pt 0.5pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 119.1px\">\n<p class=\"import-Normal\"><strong>Genetic A<\/strong><strong>l<\/strong><strong>gorith<\/strong><strong>m<\/strong><strong>s<\/strong><\/p>\n<\/td>\n<td class=\"TableNormal-C\" style=\"background-color: #e5fed8;vertical-align: bottom;border-width: 0pt 0.5pt 0.5pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 131.562px\">\n<p class=\"import-Normal\"><strong>Scenario Analysis <\/strong><\/p>\n<\/td>\n<td class=\"TableNormal-C\" style=\"background-color: #e5fed8;vertical-align: bottom;border-width: 0pt 1pt 0.5pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 152.7px\">\n<p class=\"import-Normal\"><strong>Text Analysis<\/strong><\/p>\n<\/td>\n<\/tr>\n<tr class=\"TableNormal-R\" style=\"height: 30.75pt\">\n<th style=\"background-color: #e5fed8;border-width: 0pt 0.5pt 1pt 1pt;border-style: none solid solid;border-color: windowtext;padding: 0pt 5.4pt;vertical-align: bottom;width: 111.8px\">\n<p class=\"import-Normal\"><strong>Regression Analysis<\/strong><\/p>\n<\/th>\n<td class=\"TableNormal-C\" style=\"background-color: #e5fed8;vertical-align: bottom;border-width: 0pt 0.5pt 1pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 110.7px\">\n<p class=\"import-Normal\"><strong>Data Mining <\/strong><\/p>\n<\/td>\n<td class=\"TableNormal-C\" style=\"background-color: #e5fed8;vertical-align: bottom;border-width: 0pt 0.5pt 1pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 119.1px\">\n<p class=\"import-Normal\"><strong>Linear Programming<\/strong><\/p>\n<\/td>\n<td class=\"TableNormal-C\" style=\"background-color: #e5fed8;vertical-align: bottom;border-width: 0pt 0.5pt 1pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 131.562px\">\n<p class=\"import-Normal\"><strong>Monte Carlo Simulations<\/strong><\/p>\n<\/td>\n<td class=\"TableNormal-C\" style=\"background-color: #e5fed8;vertical-align: bottom;border-width: 0pt 1pt 1pt 0pt;border-style: none solid solid none;border-color: windowtext;padding: 0pt 5.4pt;width: 152.7px\">\n<p class=\"import-Normal\"><strong>Video Analysis<\/strong><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 116.2px\"><\/td>\n<td style=\"width: 115.1px\"><\/td>\n<td style=\"width: 123.5px\"><\/td>\n<td style=\"width: 135.962px\"><\/td>\n<td style=\"width: 157.1px\"><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<h3 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"Descriptive_Analytics\"><\/span><strong>Descriptive Analytics <\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-Normal\"><em>Descriptive Analytics \u2013<\/em> is an early, pre-processing of the Data. Its intent is to identify <em>TRENDS, and PATTERNS<\/em> in the data answering questions like who, what, where, when and why certain conditions or events occurred and as revealed by the analysis of the data. For example, a high volume of positive tweets about a certain company may lead to an increase in its stock price.<\/p>\n<p class=\"import-Normal\">Descriptive Analytics uses simple tools such as Excel spreadsheets to organize, categorize, clean and prepare the data then run excel functions such as Pivot Tables, and What-If-Anaylsis. Data is typically presented in forms of graphs, charts and other graphics using Visualization tools such as Microsft\u2019s PowerBI or Plateau amongst many others.<\/p>\n<p class=\"import-Normal\"><em>Visual Analysis \u2013<\/em> Presents results of the analysis in pictorial form (see Napoleon\u2019s March). Other common use of Visualizing the data c0ontent is through what is know as a Word Count. Figure X-2 below shows a typical representation called Word Cloud.<\/p>\n<figure style=\"width: 336px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/image9.jpeg\" alt=\"Data visualization.\" width=\"336\" height=\"336\" \/><figcaption class=\"wp-caption-text\">Figure 6-3 showing visualization of the chapter on Big Data Analytics \u2013 Image generated by OpenAI\u2019s DALL\u00b7E<\/figcaption><\/figure>\n<p class=\"import-Normal\">To illustrate the concept, this very chapter on Big Data and Analytics topic was used by Word Cloud analytics. Word Cloud analysis uses frequency of occurrence of an important word by size indicating preference. As we see in the above image, the word DATA is the primary theme due to its size. Other words of lessor importance appear in a much smaller font size. <span style=\"font-size: 14px\">Fig- 7-11 Conversion Funnel for a typical eCommerce web site showing effectivity of the website in gaining new sales. Image generated by OpenAI\u2019s DALL\u00b7E<\/span><\/p>\n<p class=\"import-Normal\"><em>Conversion Funnel \u2013<\/em> is an analysis tool used to show comparative statistics.<\/p>\n<p>A visual representation below of a conversion funnel for a typical eCommerce website shows the stages and effectiveness in gaining new sales. In our example an eCommerce web site\u2019s effectivity is in attracting potential customers, keeping their interest high and ultimately converting them into actual customers. The funnel, divided into five clearly labeled sections from top to bottom\u2014&#8217;Visitors&#8217;, &#8216;Product Views&#8217;, &#8216;Add to Cart&#8217;, &#8216;Checkout Initiated&#8217;, and &#8216;Sales Completed&#8217;\u2014demonstrates how an eCommerce site attracts potential customers, maintains their interest, and ultimately converts them into actual buyers. Each section becomes progressively narrower, symbolizing the drop-off rates at each stage, with corresponding percentages displayed to highlight this attrition. The clean, minimal background highlights the color-coded stages, each with icons like a shopping bag for &#8216;Add to Cart&#8217; and a dollar sign for &#8216;Sales Completed&#8217; to enhance clarity.<\/p>\n<figure id=\"attachment_1091\" aria-describedby=\"caption-attachment-1091\" style=\"width: 300px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1091 size-medium\" src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/An_infographic-style_image_of_a_conversion_funnel_-300x300.jpg\" alt=\"Conversion funnel for a typical eCommerce website, showing the stages and effectiveness in gaining new sales. Long description above.\" width=\"300\" height=\"300\" srcset=\"https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/An_infographic-style_image_of_a_conversion_funnel_-300x300.jpg 300w, https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/An_infographic-style_image_of_a_conversion_funnel_-150x150.jpg 150w, https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/An_infographic-style_image_of_a_conversion_funnel_-768x768.jpg 768w, https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/An_infographic-style_image_of_a_conversion_funnel_-65x65.jpg 65w, https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/An_infographic-style_image_of_a_conversion_funnel_-225x225.jpg 225w, https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/An_infographic-style_image_of_a_conversion_funnel_-350x350.jpg 350w, https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/An_infographic-style_image_of_a_conversion_funnel_.jpg 1024w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><figcaption id=\"caption-attachment-1091\" class=\"wp-caption-text\">Figure 6-4 A visual representation of a conversion funnel for a typical eCommerce website. Image generated by OpenAI\u2019s DALL\u00b7E<\/figcaption><\/figure>\n<p class=\"import-Normal\"><em>Regression <\/em><em> Analysis<\/em><em> &#8211; <\/em><em> Regression analysis is a simple statistical computation of dependent Vs. Independent variables. <\/em> Let\u2019s suppose that the US government publishes a statistic on new home building starts that says \u201c\u2026Building new home starts in 2025 will exceed 2 Million new homes\u201d and lets also presume you are a manufacturer of Door Handles. What does 2 million new home starts indicate to your business? This is where how many door handles should I make decision comes in. The Quantity of how many door handles should you make is <em>dependent<\/em> on how many new homes will be built (along with other variable numbers such as competition). What are your distribution channels? Will you be using Home Depot, Lowes, commercial building supply wholesalers, etc., as you can imagine, these variables could get pretty complex.<\/p>\n<p class=\"import-Normal\">Using Excel it is pretty easy task to just plug all the variable numbers (<em>dependents<\/em>) against the <em>i<\/em><em>ndependent<\/em> variable (# of new housing starts) to generate a regression chart that would describe your analysis.<\/p>\n<h3 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"Predictive_Analysis\"><\/span><strong>Predictive Analysis <\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-Normal\"><em>Predictive Analysis \u2013 is a set of statistical tools that analyze data to predict a certain outcome<\/em> and is primarily composed of 2 branches, mainly Data Mining and Time Series analysis. We covered data mining in detail in the previous discussion on Big Data. Inhere we will discuss the topic of Time Series Analysis.<\/p>\n<p class=\"import-Normal\"><em>Time Series Analysis \u2013 is a <\/em><em>statistical tools<\/em><em> used to examines data points collected over a period of time<\/em><em>, allowing analysts to identify patterns, trends, and seasonal variations within the data by observing how values change over consistent<\/em><em> time<\/em><em> intervals, often used to forecast future values based on past data patterns<\/em>.<\/p>\n<figure style=\"width: 608px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/image10.png\" alt=\"Frequency of extreme weather for different degrees of global warming - bar chart. Long description next to image.\" width=\"608\" height=\"342\" \/><figcaption class=\"wp-caption-text\">Figure 6-5 Frequency of extreme weather for different degrees of global warming &#8211; bar chart <a href=\"https:\/\/en.m.wikipedia.org\/wiki\/File:20211109_Frequency_of_extreme_weather_for_different_degrees_of_global_warming_-_bar_chart_IPCC_AR6_WG1_SPM.svg\">Wikimedia Commons<\/a>.<\/figcaption><\/figure>\n<p class=\"import-Normal\">An example of time series data is daily, time stamped measures of temperature, dew points, humidity, wind speed, rain fall, etc. over a period spanning the last 10 years. What patterns emerge when we crunch these numbers? What is the correlation between temperature and rainfall? etc.<\/p>\n<p class=\"import-Normal\">Time series analysis can be used to predict patient arrival times at a hospital emergency room, ensuring adequate nursing staff are available, or answer the question of how many Fintech suppliers are available to run\u201d buy transactions\u201d at 3:00 AM and predict acceptance Vs. rejection of a transaction. Time series in this case performs forecasting (i.e., predicting the future outcome).<\/p>\n<h3 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"Optimization_Analysis\"><\/span><strong>Optimization Analysis<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-Normal\">Minimizing costs and maximizing profitability are cornerstone to every successful business enterprise. Management by data, rather than by intuition, is the subject of Optimization Analysis. What area of my business needs attention?, why are my labor costs much higher than industry benchmarks?, how do I improve my supply chain? and so many more questions can be answered by applying Optimization Analysis methods. One popular analysis method is called the Genetic Algorithm.<\/p>\n<figure style=\"width: 535px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/image11-1.jpeg\" alt=\"Introgressive hybridization in plants\" width=\"535\" height=\"586\" \/><figcaption class=\"wp-caption-text\">Figure 6-6 <a href=\"https:\/\/en.wikipedia.org\/wiki\/Introgressive_hybridization_in_plants#\/media\/File:Backcrossing_leads_to_introgression.jpg\">\u00a0Image of Genetic Process<\/a> <a class=\"new\" title=\"User:Mcruzan (page does not exist)\" href=\"https:\/\/commons.wikimedia.org\/w\/index.php?title=User:Mcruzan&amp;action=edit&amp;redlink=1\">Mcruzan<\/a>\u00a0<a class=\"mw-mmv-license\" href=\"https:\/\/creativecommons.org\/licenses\/by-sa\/4.0\" target=\"_blank\" rel=\"noopener\">CC BY-SA 4.0<\/a><\/figcaption><\/figure>\n<p class=\"import-Normal\"><em>Genetic Algorithms \u2013 A reference to the English Naturalist Charles Darwin\u2019s Theory of Evolution where he states<\/em><em>:<\/em><em> \u201cAll organisms rise and develop through natural evolution processes of small, inherited variations (Shannon, n.d.).<\/em><\/p>\n<p class=\"import-Normal\">A genetic algorithm is a software-driven step-by-step process that replicates the inheritance properties in the data in order to find approximate solutions to optimization. Think of a married couple, one with brown eyes, the other has green eyes. What is the chance of having a baby with hazel colored eyes? At a mathematical average it should be 50% of the time.<\/p>\n<p class=\"import-Normal\">A genetic algorithms works by taking a node (starting) population of data called chromosomes of individual solutions and through multiple simulations this population is gradually evolved, by an iterative process towards a better and better solution. Think again of playing chess and again with every new game, eliminating (not playing) prior moves when you said to yourself I should have not played that move, and selecting only those prior moves that resulted in a better outcome. For each generation (successive plays), your chances of winning the game improve.<\/p>\n<h3 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"Linear_Programming\"><\/span><strong>Linear Programming <\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-Normal\">Linear programming is a simple algebraic method that uses linear equations to determine how to arrive at the optimal situation (maximum or minimum) as an answer to a mathematical problem(Shannon, 1948). (e.g., 2X+3=8).<\/p>\n<p class=\"import-Normal\">Again, Excel simplifies solving for linear functions with one or more variables.<\/p>\n<h3 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"Simulations\"><\/span><strong>Simulations <\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-Normal\">In Florida, weather patterns are monitor by the US National Oceanic and Atmospheric Administration (NOAA), <span class=\"import-jpfdse\">National Hurricane Center<\/span> (NHC) that analyze satellite imagery, weather data and historical observations to generates computer models that make forecast decisions and create hazard information for emergency managers, media and the public for hurricanes, tropical storms and tropical depressions.<\/p>\n<p class=\"import-Normal\">A computer simulation mimics the real world (virtual) and as such in Fintech, dynamic responses of a system. For example, take a transaction process starting at a point-of-sale, and all the intermediary steps of transaction aggregation, clearing, processing, and settlement ca be simulated to understand the behavior of the system. Simulation would answer questions like what would happen if we stress the system by having billions of transactions flow simultaneously. What would break? How and where would it break? and what are the net results of the break?<\/p>\n<p class=\"import-Normal\">Financial Systems and Fintech systems simulations work using sophisticated statistical methods such as Monte Carlo analysis that \u201cmodel\u201d the different financial market conditions and potential outcomes by assigning random values to uncertain variables and running multiple iterations to calculate probabilities of various results;\u00a0this is widely used in financial risk management, asset valuation, and portfolio allocation. An example use would be a financial planner helping a retiree with answering \u201chow long will my retirement portfolio last\u201d?<\/p>\n<p class=\"import-Normal\">Again, as complex as it may sound, simple solutions (simulations) can be developed using Excel. Running multiple Monte Carlo simulation functions, one can determine answers like \u201cin 100 runs of the simulations, only 20 runs indicated that this portfolio will last for more than 10 years\u201d.<\/p>\n<h3 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"Text_and_Video_Analytics\"><\/span><strong>Text and Video Analytics <\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-Normal\">Text and Video Analytics are techniques used in evaluating textual, as well as video imagery to find hidden patterns in the data.<\/p>\n<p class=\"import-Normal\"><em>Textual Analytics<\/em><em>&#8211; I<\/em><em>s a process that extracts value from ver<\/em><em>y large amount of textual data such as consumer reports, comments, complaints and product reviews. It also monitors social media postings to identify consumer sentiment and to recognize changes in consumer behavior. <\/em><\/p>\n<p class=\"import-Normal\">Many companies in the US use free-form call center records to synthesize their support operations, marketing and pricing strategies, and new product development.<\/p>\n<p class=\"import-Normal\"><em>Video Analytics \u2013 is also a process that extracts deeper insights <\/em><em>from video footage<\/em>. Imagine the value, in addition to cost savings, while running a new product development marketing campaign, of having customers sample your latest perfume while you observe their facial expression after sampling the products you are market testing. Many transportation companies as for Trains, Busses and Planes, currently use video analytics to understand commuter behavior and to implement methods to ease congestion at terminals.<\/p>\n<h3 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"Data_Marts\"><\/span><strong>Data Marts <\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-Normal\"><em>Data marts<\/em> <em>are specialized, smaller-scale repositories<\/em> that store data custom-tailored to meet the needs of specific business units or functional areas within an organization, such as sales, marketing, or finance. Unlike comprehensive data warehouses, which aggregate data across an organization, data marts focus on particular domains, offering streamlined data analysis for targeted decision-making.<\/p>\n<p>&nbsp;<\/p>\n<figure style=\"width: 602px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" class=\"\" src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/image12.png\" alt=\"Data Collection and Brokering\" width=\"602\" height=\"259\" \/><figcaption class=\"wp-caption-text\"><a href=\"https:\/\/en.wikipedia.org\/wiki\/Data_imaginaries#\/media\/File:Data-driven_vs_data-brokering.png\">Data Collection and Brokering<\/a>,<span class=\"mw-mmv-author\"><a class=\"new\" title=\"User:Jasraj Raghuwanshi (page does not exist)\" href=\"https:\/\/commons.wikimedia.org\/w\/index.php?title=User:Jasraj_Raghuwanshi&amp;action=edit&amp;redlink=1\">Jasraj Raghuwanshi<\/a><\/span>\u00a0 <a class=\"mw-mmv-license\" href=\"https:\/\/creativecommons.org\/licenses\/by-sa\/4.0\" target=\"_blank\" rel=\"noopener\">CC BY-SA 4.0<\/a><\/figcaption><\/figure>\n<p class=\"import-Normal\">Data Marts extracts and organizes data from the central data warehouse (or external sources), optimizing it for easy access and faster queries. By categorizing Data, companies are often using this organized data to sell rather than use internally. A search on Google\u2019s website for example reveals 100\u2019s of companies known as Data Brokers\u201d with names like Acxiom, Experian, CoreLogic, Google, Salesforce, and Signal AI that are considered to be among the companies that sell user data (your data\u00a0and mine), often by collecting personal information from various sources and then selling it to other businesses in bulk for targeted marketing and other purposes (The Verge, 2025).<\/p>\n<h3 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"Data_Lakes\"><\/span><strong>Data Lakes<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<figure style=\"width: 518px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" class=\"\" src=\"http:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-content\/uploads\/sites\/96\/2024\/12\/image13.png\" alt=\"Data Lake\" width=\"518\" height=\"518\" \/><figcaption class=\"wp-caption-text\">A representation of a Data Lake \u2013 Courtesy ChatGPT<\/figcaption><\/figure>\n<p class=\"import-Normal\">Structured data is quite easy for computers to process. The challenge with Big Data are unstructured data such as a streamed movie, and an hour of streamed music are just few on unstructured data sources. Unlike traditional data warehouses that store structured data in predefined schemas, <em>data lakes allow for the storage of raw data <\/em><em>regardless of<\/em><em> format<\/em><em>, size, or source. <\/em>Hence the term Lake that has many streams pouring into it and many different species of aquatic life. This capability enables businesses to capture a wide range of data types and unlock new opportunities for advanced analytics and artificial intelligence.<\/p>\n<p class=\"import-Normal\">Data lakes have certain key characteristics that include:<\/p>\n<ul>\n<li><em>Scalability<\/em> &#8211; They can store enormous amounts of data often in multiples of Petabytes (1 petabyte = 1 followed by 15 zeros)<\/li>\n<li><em>Flexibility:<\/em> Stores raw data in its original form.<\/li>\n<li><em>Diversity:<\/em> Accommodates structured, semi-structured, and unstructured data.<\/li>\n<li><em>Accessibility:<\/em> Provides seamless access for analytics, machine learning, and reporting.<\/li>\n<\/ul>\n<div style=\"margin: auto\">\n<table>\n<thead>\n<tr style=\"height: 0\">\n<th style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\" style=\"text-align: center\"><strong>Featur<\/strong><strong>e<\/strong><\/p>\n<\/th>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\" style=\"text-align: center\"><strong>Data Lake<\/strong><\/p>\n<\/td>\n<td class=\"TableGrid-C\" style=\"vertical-align: middle;border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\" style=\"text-align: center\"><strong>Data Warehouse<\/strong><\/p>\n<\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr class=\"TableGrid-R\" style=\"height: 0\">\n<th style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\"><strong>Data Type<\/strong><\/p>\n<\/th>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\">Structured. Semi-structured and unstructured<\/p>\n<\/td>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\">Strictly structured<\/p>\n<\/td>\n<\/tr>\n<tr class=\"TableGrid-R\" style=\"height: 0\">\n<th style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\"><strong>Schema (model)<\/strong><\/p>\n<\/th>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\">Built and deployed during the Analysis phase when data is being read<\/p>\n<\/td>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\">Pre-designed and focuses on Storing the data in a specific format and size<\/p>\n<\/td>\n<\/tr>\n<tr class=\"TableGrid-R\" style=\"height: 0\">\n<th style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\"><strong>Cost<\/strong><\/p>\n<\/th>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\">Storing the data is minimal<\/p>\n<\/td>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\">Storing the data is comparatively high<\/p>\n<\/td>\n<\/tr>\n<tr class=\"TableGrid-R\" style=\"height: 0\">\n<th style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\"><strong>Used for <\/strong><\/p>\n<\/th>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\">Analytics, Aritificial Intelligence and Machine Learning<\/p>\n<\/td>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\">Reporting and Business Intelligence<\/p>\n<\/td>\n<\/tr>\n<tr class=\"TableGrid-R\" style=\"height: 0\">\n<th style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\"><strong>Scalability<\/strong><\/p>\n<\/th>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\">Highly Scalable<\/p>\n<\/td>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\">Limited by cost<\/p>\n<\/td>\n<\/tr>\n<tr class=\"TableGrid-R\" style=\"height: 0\">\n<th style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\"><strong>Data Type<\/strong><\/p>\n<\/th>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\">Structured, semi-structured, unstructured<\/p>\n<\/td>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\">Structured<\/p>\n<\/td>\n<\/tr>\n<tr class=\"TableGrid-R\" style=\"height: 0\">\n<th style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\"><strong>Schema<\/strong><\/p>\n<\/th>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\">Schema-on-read<\/p>\n<\/td>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\">Schema-on-write<\/p>\n<\/td>\n<\/tr>\n<tr class=\"TableGrid-R\" style=\"height: 0\">\n<th style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\"><strong>Cost<\/strong><\/p>\n<\/th>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\">Low-cost storage<\/p>\n<\/td>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\">High-cost storage<\/p>\n<\/td>\n<\/tr>\n<tr class=\"TableGrid-R\" style=\"height: 0\">\n<th style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\"><strong>Purpose<\/strong><\/p>\n<\/th>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\">Analytics, AI, and ML<\/p>\n<\/td>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\">Reporting and business intelligence<\/p>\n<\/td>\n<\/tr>\n<tr class=\"TableGrid-R\" style=\"height: 0\">\n<th style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\"><strong>Scalability<\/strong><\/p>\n<\/th>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\">Highly scalable<\/p>\n<\/td>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-Normal\">Limited scalability<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p class=\"import-Normal\" style=\"text-align: center\"><em>Table 7-2 Comparative Analysis of <\/em><em>F<\/em><em>eatures of Data Lakes Vs. Data Warehouse<\/em><\/p>\n<h3 class=\"import-Normal\"><span class=\"ez-toc-section\" id=\"Data_Lakes_in_Fintech\"><\/span><strong>Data Lakes in Fintech<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"import-Normal\">Data Lakes have become integral to financial technology (FinTech) companies due to their ability to manage and analyze vast, diverse, and rapidly growing datasets. FinTech businesses rely on data lakes to perform:<\/p>\n<ul>\n<li><em>Fraud Detection and Risk Management<\/em><em> &#8211; <\/em>Fraud detection requires sophisticated models that can process massive datasets to identify anomalies and patterns indicative of fraudulent activity. The ability of Data Lakes to handle varying data types is crucial, as fraud signals often come from diverse sources, such as emails, call logs, or social media.<\/li>\n<li><em>Anomaly Detection<\/em><strong> &#8211; <\/strong>Using machine learning algorithms trained on historical data to flag suspicious activities.<\/li>\n<li><em>Behavioral Analysis-<\/em>\u00a0Tracking spending patterns, device fingerprints, and geolocation data to identify unusual transactions.<\/li>\n<li><em>Risk Assessment<\/em><em>&#8211; <\/em>Combining external and internal data to calculate risk scores for transactions or user accounts<em>.<\/em><\/li>\n<li><em>Regulatory Compliance and Auditing<\/em><em>&#8211; <\/em> Financial institutions face stringent regulations such as GDPR, CCPA, KYC, and anti-money laundering (AML) laws. Data lakes capabilities are foten called upon to perform data retention analysis, audit trails and to perform automated reporting<em>. <\/em><\/li>\n<li><em>Personalization and Customer Insights<\/em>&#8211; FinTech companies use data lakes to develop personalized services, such as providing Customized Financial Products, enabling real-time (Dynamic) Pricing Models for adjusting rates for loans, insurance, or investment products all based on real-time data.<\/li>\n<li><em>Algorithmic Trading<\/em><em> &#8211; <\/em>Algorithmic or automated trading (sometimes called Robo-trade); relies on analyzing vast amounts of market and historical data to make split-second decisions.<\/li>\n<li><em>Data Aggregation<\/em><em> &#8211; <\/em>Collecting market feeds, news articles, and historical price data in real-time.<\/li>\n<li><em>Back <\/em><em>T<\/em><em>esting<\/em><em> Models<\/em><strong> &#8211; <\/strong>Using historical data stored in the lake to evaluate the effectiveness of trading algorithms.<\/li>\n<li><em>Market Sentiment Analysis<\/em><em> &#8211; <\/em>Incorporating alternative datasets, such as social media sentiment, to inform trading strategies.<\/li>\n<\/ul>\n<div class=\"textbox\">\n<div class=\"group\/conversation-turn relative flex w-full min-w-0 flex-col agent-turn\">\n<div class=\"flex-col gap-1 md:gap-3\">\n<div class=\"flex max-w-full flex-col flex-grow\">\n<div class=\"min-h-8 text-message flex w-full flex-col items-end gap-2 whitespace-normal break-words [.text-message+&amp;]:mt-5\" dir=\"auto\" data-message-author-role=\"assistant\" data-message-id=\"5555fc12-a2a4-4540-bc00-a0effb0de35c\" data-message-model-slug=\"gpt-4o\">\n<div class=\"flex w-full flex-col gap-1 empty:hidden first:pt-[3px]\">\n<div class=\"markdown prose w-full break-words dark:prose-invert light\">\n<h3><span class=\"ez-toc-section\" id=\"Licenses_and_Attribution\"><\/span><strong>Licenses and Attribution<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><span class=\"ez-toc-section\" id=\"CC_Licensed_Content_Original\"><\/span>CC Licensed Content, Original<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p><span data-teams=\"true\">This educational material includes AI-generated content from ChatGPT by OpenAI. The original content created by Mohammed Kotaiche from Hillsborough Community College is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (<a id=\"menur5so\" class=\"fui-Link ___1q1shib f2hkw1w f3rmtva f1ewtqcl fyind8e f1k6fduh f1w7gpdv fk6fouc fjoy568 figsok6 f1s184ao f1mk8lai fnbmjn9 f1o700av f13mvf36 f1cmlufx f9n3di6 f1ids18y f1tx3yz7 f1deo86v f1eh06m1 f1iescvh fhgqx19 f1olyrje f1p93eir f1nev41a f1h8hb77 f1lqvz6u f10aw75t fsle3fq f17ae5zn\" title=\"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/deed.en\" href=\"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/deed.en\" rel=\"noreferrer noopener\" aria-label=\"Link CC BY-NC 4.0\">CC BY-NC 4.0<\/a>).\u00a0<\/span><\/p>\n<div class=\"flex-shrink-0 flex flex-col relative items-end\">\n<div>\n<div class=\"pt-0\">\n<div class=\"gizmo-bot-avatar flex h-8 w-8 items-center justify-center overflow-hidden rounded-full\">\n<div class=\"relative p-1 rounded-sm flex items-center justify-center bg-token-main-surface-primary text-token-text-primary h-8 w-8\">All images in this textbook generated with DALL-E are licensed under the terms provided by OpenAI, allowing for their free use, modification, and distribution with appropriate attribution.<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<hr \/>\n<h4><span class=\"ez-toc-section\" id=\"CC_Licensed_Content_Included\"><\/span><strong>CC Licensed Content Included<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h4>\n<\/div>\n<\/div>\n<div class=\"flex w-full flex-col gap-1 empty:hidden first:pt-[3px]\">\n<div class=\"markdown prose w-full break-words dark:prose-invert light\">\n<div>\n<ul>\n<li style=\"list-style-type: none\">\n<ul>\n<li><strong>ASCII<\/strong><br \/>\nSource: <a href=\"https:\/\/en.wikipedia.org\/wiki\/ASCII\" target=\"_new\" rel=\"noopener\">Wikipedia<\/a><br \/>\nLicense: CC BY-SA 3.0<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<ul>\n<li style=\"list-style-type: none\">\n<ul>\n<li><strong>Federal Reserve Payments Study<\/strong><br \/>\nSource: <a href=\"https:\/\/www.federalreserve.gov\/paymentsystems\/fr-payments-study.htm\" target=\"_new\" rel=\"noopener\">Federal Reserve<\/a><br \/>\nLicense: Public Domain<\/li>\n<li><strong>OLAP 3D Cube<\/strong><br \/>\nSource: <a href=\"https:\/\/commons.wikimedia.org\/wiki\/File:Olap-3d-cube.png\" target=\"_new\" rel=\"noopener\">Wikimedia Commons<\/a><br \/>\nLicense: CC BY-SA 4.0<\/li>\n<li><strong>Frequency of Extreme Weather for Different Degrees of Global Warming<\/strong><br \/>\nSource: <a href=\"https:\/\/commons.wikimedia.org\/wiki\/File:20211109_Frequency_of_extreme_weather_for_different_degrees_of_global_warming_-_bar_chart_IPCC_AR6_WG1_SPM.svg\" target=\"_new\" rel=\"noopener\">Wikimedia Commons<\/a><br \/>\nLicense: CC BY-SA 4.0<\/li>\n<li><strong>Backcrossing Leads to Introgression<\/strong><br \/>\nSource: <a href=\"https:\/\/commons.wikimedia.org\/wiki\/File:Backcrossing_leads_to_introgression.jpg\" target=\"_new\" rel=\"noopener\">Wikimedia Commons<\/a><br \/>\nLicense: CC BY-SA 4.0<\/li>\n<li><strong>National Oceanic and Atmospheric Administration (NOAA)<\/strong><br \/>\nSource: <a href=\"https:\/\/www.noaa.gov\/\" target=\"_new\" rel=\"noopener\">NOAA<\/a><br \/>\nLicense: Public Domain<\/li>\n<li><strong>Data-Driven vs. Data-Brokering<\/strong><br \/>\nSource: <a href=\"https:\/\/commons.wikimedia.org\/wiki\/File:Data-driven_vs_data-brokering.png\" target=\"_new\" rel=\"noopener\">Wikimedia Commons<\/a><br \/>\nLicense: CC BY-SA 4.0<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/div>\n<hr \/>\n<h4><span class=\"ez-toc-section\" id=\"Other_Licensed_Content_Included\"><\/span>Other Licensed Content Included<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<ul>\n<li><strong>What is Data Mining and Why is it Important?<\/strong> Source: YouTube. License: Standard YouTube License. Retrieved from<br \/>\n<a href=\"https:\/\/www.youtube.com\/results?search_query=what+is+data+mining+and+why+is+it+important\" target=\"_blank\" rel=\"noopener\">YouTube\u2019s Video on Data Mining<\/a>.<\/li>\n<li><strong>The Difference Between Unstructured Data and Structured Data<\/strong>. Source: AIIM Blog. License: Copyright \u00a9 2025 Association for Intelligent Information Management. All rights reserved. Retrieved from <a href=\"https:\/\/www.aiim.org\/blog\/difference-between-unstructured-and-structured-data\" target=\"_blank\" rel=\"noopener\">AIIM Blog: Difference Between Unstructured and Structured Data<\/a>.<\/li>\n<li><strong>Google Search Labs<\/strong>. Source: The Verge. License: Standard Copyright. Retrieved from <a href=\"https:\/\/www.theverge.com\/google-search-labs\" target=\"_blank\" rel=\"noopener\">The Verge\u2019s Coverage of Google Search Labs<\/a>.<\/li>\n<li><strong>Global Mobile Data Usage Forecast<\/strong>. Source: Statista. License: Standard Copyright. Retrieved from <a href=\"https:\/\/www.statista.com\/statistics\/270749\/global-mobile-data-traffic-forecast\/\" target=\"_blank\" rel=\"noopener\">Statista\u2019s Global Mobile Data Usage Forecast<\/a>.<\/li>\n<li><strong>Definition of &#8216;Schema&#8217;<\/strong>. Source: Oxford English Dictionary. License: Standard Copyright. Retrieved from <a href=\"https:\/\/www.oed.com\/view\/Entry\/172074\" target=\"_blank\" rel=\"noopener\">Oxford English Dictionary\u2019s Definition of Schema<\/a>.<\/li>\n<li><strong>Internet Screen Time Statistics 2024<\/strong>. Source: Reviews.org. Retrieved from <a href=\"https:\/\/www.reviews.org\/internet-service\/internet-screen-time-statistics\/\" target=\"_blank\" rel=\"noopener\">Reviews.org\u2019s Internet Screen Time Statistics 2024<\/a>.<\/li>\n<li><strong>Building the Data Warehouse<\/strong>. Author: Bill Inmon. Publisher: Wiley, 1988. License: Standard Copyright. Retrieved from<br \/>\n<a href=\"https:\/\/www.wiley.com\/en-us\/Building+the+Data+Warehouse%2C+4th+Edition-p-9780764599446\" target=\"_blank\" rel=\"noopener\">Wiley\u2019s Page for Building the Data Warehouse<\/a>.<\/li>\n<li><strong>A Mathematical Theory of Communication<\/strong>. Author: Claude E. Shannon. Source: University of Illinois. Retrieved from<br \/>\n<a href=\"https:\/\/archive.org\/details\/amathematicaltheoryofcommunication\" target=\"_blank\" rel=\"noopener\">University of Illinois\u2019 Archive of A Mathematical Theory of Communication<\/a>.<\/li>\n<li><strong>How Many Words Does the Average Person Say a Day?<\/strong>. Source: WordsRated. License: Standard Copyright. Retrieved from <a href=\"https:\/\/wordsrated.com\/how-many-words-average-person-says-per-day\/\" target=\"_blank\" rel=\"noopener\">WordsRated\u2019s Study on Average Words Spoken Per Day<\/a>.<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"author":2,"menu_order":6,"comment_status":"open","ping_status":"closed","template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-648","chapter","type-chapter","status-publish","hentry"],"part":3,"_links":{"self":[{"href":"https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-json\/pressbooks\/v2\/chapters\/648","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-json\/wp\/v2\/comments?post=648"}],"version-history":[{"count":84,"href":"https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-json\/pressbooks\/v2\/chapters\/648\/revisions"}],"predecessor-version":[{"id":825,"href":"https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-json\/pressbooks\/v2\/chapters\/648\/revisions\/825"}],"part":[{"href":"https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-json\/pressbooks\/v2\/parts\/3"}],"metadata":[{"href":"https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-json\/pressbooks\/v2\/chapters\/648\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-json\/wp\/v2\/media?parent=648"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-json\/pressbooks\/v2\/chapter-type?post=648"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-json\/wp\/v2\/contributor?post=648"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.hccfl.edu\/introtofintech\/wp-json\/wp\/v2\/license?post=648"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}