Table of Contents
Understand the Splunk Architecture with Different Design Topologiess
If you are here, we assume that you have some information about Splunk and maybe its components. So, we are not going to repeat the basic things again in this blog post, since there are a lot of things to cover. However, we urge you to read the basics of Splunk and its components before you proceed with this blog post. If not, you may get a little confused or misunderstand some of the terms and concepts in this blog post.
We know that you are here to understand more about the architecture before going for the deployment. Before that we should tell you that who should read this blog post, what you will learn by the end of this blog post, and things this blog post do not address.
Who Should Read This?
Before we go further, we must clarify you that we created this blog post for Splunk Architects, Consultants, Engineers and Administrators who should care about Splunk architecture and topology designs. This post is not for Analysts and investigators.
If you belong to one of these categories, this blog post will provide you with valuable insights into Splunk architecture and various topology designs. As an enterprise architect, you will learn how to design Splunk deployments that align with your organization's requirements. Consultants specializing in Splunk will gain a deeper understanding of architecture and design best practices to better serve their clients. Splunk administrators will benefit from learning about the different components and their roles in the Splunk lifecycle. Finally, managed service providers will discover how to effectively deploy and manage Splunk as a service for their customers.
By the end of this blog post, you will have a strong foundation in Splunk architecture and be equipped to make informed decisions when designing and implementing Splunk topologies that best suit your needs.
What This Blog Post Does Not Address?
While this blog post aims to provide a comprehensive understanding of Splunk architecture and topology designs, there are certain aspects that are considered out of scope. It is important to clarify these points to set the right expectations for the readers.
Deployment technologies, like operating systems & server hardware, since they are considered implementation choices
This blog post will not delve into the specifics of deployment technologies such as operating systems and server hardware. These elements are considered implementation choices and may vary depending on the organization's preferences and existing infrastructure. The focus will be on Splunk architecture and topology designs, independent of the underlying deployment technologies.
Deployment sizing involves understanding data ingest and search volumes as well as search use cases and generally does not affect the deployment architecture
Another aspect that this blog post will not cover in detail is deployment sizing. Determining the appropriate size of a Splunk deployment involves understanding factors such as data ingestion volume, search volume, and specific search use cases. While these factors are crucial for planning the capacity and resources required for a Splunk deployment, they generally do not significantly impact the overall deployment architecture. Therefore, this blog post will prioritize discussing the architectural components and topology designs rather than providing guidance on sizing the deployment.
By clarifying these out-of-scope items, readers can better understand the focus and limitations of this blog post and set their expectations accordingly.
Different Phases of Data Pipeline or Splunk Processing Tires
Before diving into the various Splunk topology designs, it is necessary to understand the components and the data flow or data lifecycle management of a solution before designing the architecture. Let's learn about the different phases of the Splunk Processing Tiers, also known as the Data Pipeline, and the components that work in each phase.
The Splunk Processing Tiers consist of three main phases: Data Input, Data Storage, and data Search. However, Data Storage is further divided into two more stages: Parsing and Indexing.
Data Input (Collection)
2. Data Storage
a. Parsing
b. Indexing
3. Data Search
Each phase plays a crucial role in processing and managing the machine data from the moment it enters the Splunk system until it is available for searching and analysis.
Input Phase
The Input phase is the entry point for machine data into the Splunk system. Splunk supports various input methods, such as files and directories, network events, and Splunk Forwarders. The most common input method is using Splunk Forwarders, which are lightweight agents installed on the data sources. These Forwarders collect data from log files, metrics, and other sources and securely forward it to the Splunk Indexers. When data enters the Input phase, Forwarders break the raw data in to 64K blocks and add each block with meta keys like hostname, source, and source type and applies input processing rules to structure and normalize the data. Input processing rules can include tasks like timestamping, event breaking, and field extraction. These rules help ensure that the data is in a consistent format before moving to the next phase.
Parsing Phase
In the Parsing phase, Splunk transforms the raw data into a structured format suitable for indexing.
Splunk breaks the incoming data streams into individual events based on configurable break rules. These rules define how Splunk should split the data into discrete events, such as by newline characters or regular expressions.
Splunk identifies and assigns timestamps to each event based on the time information present in the data. It looks for common timestamp formats and extracts the relevant time information. If no timestamp is found, Splunk assigns the current system time to the event.
In some cases, a single event may span across multiple lines. Splunk's linemerging process combines these multiple lines into a single event based on predefined patterns or timeout settings. This ensures that multi-line events are properly handled and indexed.
Indexing Phase
After the data is parsed, it enters the Indexing phase. In this phase, Splunk Indexers process and store the events in a compressed format on disk for efficient searching and retrieval. The indexing process involves several key steps:
Index Creation: Splunk creates indexes, which are directories that store the indexed events. Each index is associated with a specific set of data and has its own configuration settings.
Event Compression: Splunk compresses the indexed events to optimize storage space and improve search performance. The compressed events are stored in proprietary Splunk data files within the indexes.
Metadata Extraction: During indexing, Splunk extracts metadata from the events, such as source, sourcetype, and host. This metadata helps in organizing and searching the data effectively.
Index Replication: Splunk can replicate indexes across multiple Indexers to ensure high availability and data redundancy. Replication allows for load balancing and prevents data loss in case of Indexer failures.
Search Phase
The Search phase is where users interact with the indexed data to perform searches, analysis, and visualization. Splunk Search Heads are the components responsible for handling search requests and presenting the results to users. When a user submits a search query, the Search Head distributes the query to the Indexers or other Search Pears. The Indexers search through their respective indexes to find the relevant events based on the search criteria. The Search Head then collects and aggregates the search results from the Indexers and presents them to the user. Splunk's search language, called the Search Processing Language (SPL), provides a powerful and flexible way to search and analyze the indexed data. Users can use SPL commands to filter events, extract fields, perform calculations, and visualize the results using charts, tables, and dashboards.
Throughout the Data Pipeline, Splunk Forwarders, Indexers, and Search Heads work together to ensure the smooth flow of data from the input sources to the end-users. Forwarders collect and send data to the Indexers, which parse and index the events. Search Heads then retrieve the indexed data from the Indexers based on user searches and provide the results for analysis and visualization.
By understanding the Splunk Processing Tiers and the role of each component in the Data Pipeline, you can better design and architect your Splunk deployment to efficiently handle the data flow and meet your organization's requirements.
Example 1: Single Server Topology (Standalone Deployment)
In a Single Server Topology, also known as a Standalone Server Deployment, all the functionalities of Splunk are performed by a single instance. This means that data input, parsing, indexing, and searching are all handled by one Splunk server.
As illustrated in the topology diagram, the standalone deployment consists of two main tiers: the Collection Tier (Data Input) and the Search/Indexing Tier. In the Collection Tier, data is ingested into Splunk through various input methods such as Forwarders, Network Inputs, and Other Inputs. The Search/Indexing Tier represents the single Splunk instance that processes and stores the data, as well as handles search requests.
The characteristics of a Single Server Topology highlight its suitability for specific use cases. This deployment model is ideal for departmental, non-critical use cases with data onboarding volumes up to approximately 300GB/day. It is commonly used in test environments or for small enterprise log management scenarios.
One notable limitation of the Single Server Topology is the lack of high availability for search and indexing. Since all functions are performed by a single instance, there is no redundancy or failover mechanism in place. If the standalone server experiences issues or downtime, it directly impacts the entire Splunk deployment.
However, the Single Server Topology offers simplicity in management and ease of migration to a distributed deployment if the need arises. As data volumes and user requirements grow, transitioning from a standalone deployment to a distributed architecture is a straightforward process.
Example 2: Single-Site Distributed Cluster Topology (Distributed Deployment)
In contrast to the Single Server Topology, a Single-Site Distributed Cluster Topology, also known as a Distributed Deployment, is designed to handle large-scale environments with high data volume and user concurrency. This deployment model separates the Splunk architecture into distinct components: Forwarders, Indexers, and Search Heads, as depicted in Image.
Forwarders play a crucial role in the Collection Tier of the distributed deployment. They are lightweight agents installed on the data sources, such as servers, network devices, and endpoints, to collect log data from various sources. The Forwarders are responsible for efficiently gathering data and forwarding it to the Indexers for further processing.
In the Search/Indexing Tier, Indexers form the backbone of the distributed deployment. They receive the data from the Forwarders and perform the necessary processing, including parsing and indexing. The indexed data is then stored by the Indexers for later retrieval. To distribute the indexing workload and ensure data availability and redundancy, multiple Indexers can be configured in a cluster. This allows for horizontal scaling and fault tolerance, as the indexing responsibilities are shared among the Indexers.
The Search Heads, residing in the Management Tier, serve as the user interface for searching and interacting with the indexed data. They handle search requests from users, distribute the searches to the Indexers, and consolidate the results for presentation. In a Single-Site Distributed Cluster Topology, multiple Search Heads can be deployed to handle high user concurrency and provide load balancing. This ensures that the search performance remains optimal even under heavy user loads.
The characteristics of a Single-Site Distributed Cluster Topology highlight its suitability for enterprise-level deployments. This deployment model supports data onboarding volumes up to 10TB/day, making it capable of handling large-scale data ingestion. The distributed architecture ensures high availability for both search and indexing, as the components are distributed across multiple nodes. This provides redundancy and fault tolerance, minimizing the impact of any single point of failure.
Scaling a Single-Site Distributed Cluster Topology is relatively straightforward. Additional Forwarders can be added to collect data from new sources, Indexers can be added to the cluster to handle increased indexing workload, and Search Heads can be added to accommodate higher user concurrency. This scalability allows organizations to grow their Splunk deployment as their data volumes and user requirements expand.
However, it's important to note that while a Single-Site Distributed Cluster Topology offers improved scalability and performance compared to a Single Server Topology, it also introduces additional complexity in terms of management and configuration. Proper planning and expertise are required to design, deploy, and maintain a distributed Splunk architecture effectively.
Example 3: Multi-Site Distributed Cluster Topology
The Multi-Site Distributed Cluster Topology is an advanced deployment model that extends the capabilities of the Single-Site Distributed Cluster Topology by introducing additional redundancy and fault tolerance across multiple geographic locations. This topology is designed to provide protection against site failures and ensure continuous operation of the Splunk environment.
As shown in the topology diagram, the Multi-Site Distributed Cluster Topology consists of two sites: Site A and Site N. Each site represents a distinct geographic location and contains its own set of Splunk components, including Search Heads, Indexers, and Forwarders.
One of the key characteristics of this topology is the addition of Search Head Clustering (SHC) to the search tier. In a Search Head Cluster, multiple Search Heads are configured to work together as a single logical unit. The Search Heads in the cluster share the search workload and provide high availability and failover capabilities. If one Search Head fails, the others can continue serving search requests without interruption.
To ensure optimal performance and reliability, the Multi-Site Distributed Cluster Topology requires a dedicated SHC cluster. The Search Head capacity is shared among the cluster members, and search artifacts, such as scheduled searches and dashboards, are replicated across each Search Head in the cluster. This allows for consistent and uninterrupted access to search functionality, even in the event of a site failure.
Another important consideration in a Multi-Site Distributed Cluster Topology is the WAN (Wide Area Network) latency between the sites. To maintain proper functionality and performance, the WAN latency must be less than 100 milliseconds. This low latency ensures that data replication and synchronization between the sites occur efficiently and without significant delays.
In the Collection Tier, Forwarders are deployed at each site to collect data from various sources. The Forwarders send the collected data to the Indexers in their respective sites for processing and indexing. This distributed data collection approach ensures that data is collected and processed locally, minimizing the impact of network latency.
The Indexing Tier at each site consists of an Indexer Cluster, where multiple Indexers work together to distribute the indexing workload and provide data redundancy. The Indexer Clusters at each site operate independently, storing and managing their own set of indexed data. However, they can be configured to replicate data between sites for additional data protection and disaster recovery purposes.
The Management Tier in a Multi-Site Distributed Cluster Topology includes components such as Deployment Server (DS), License Master (LM), Monitoring Console (MC), and SHC Deployer (SHC-D). These components are responsible for managing and monitoring the Splunk environment across both sites. The DS handles the deployment and configuration of Splunk components, while the LM manages licensing. The MC provides centralized monitoring and troubleshooting capabilities, and the SHC-D is responsible for deploying and managing the Search Head Cluster.
Recommended Best Practices for Data Collection, Indexing, and Search Tires
No matter if it is a Splunk or any other solution, your design should fulfil these five aspects. Architectural design is never be considered perfect if you ignore any one of these five pillars:
Availability
Performance
Scalability
Security
Manageability
Splunk has recommended a few best practices for Data Collection, Indexing, and Search Tires. Let's explore the recommended best practices for each tier.
Data Collection Tier Best Practices
Use the Universal Forwarder (UF) to forward data whenever possible: The UF is the best choice for most data collection requirements due to its small resource demand, built-in load balancing, centralized configurability, and restart capabilities. Use a heavy forwarder only when specific use cases require it.
Limit the use of intermediary forwarders: If intermediary forwarders are necessary, ensure that there are at least twice as many intermediary forwarder pipelines as indexers to maintain balanced event distribution across the indexing tier.
Secure UF traffic using SSL/TLS: Encrypting data in transit improves security and reduces the amount of data transmitted.
Use the native Splunk load balancer: Utilize Splunk's built-in load balancing capabilities to distribute data evenly across the indexing tier. Avoid using network load balancers between forwarders and indexers.
Utilize Splunk Connect for Syslog (SC4S) for syslog collection: Deploy SC4S containers as close to the data sources as possible for efficient and configurable syslog data collection.
Use the HTTP Event Collector (HEC) for agentless collection: HEC provides a reliable and scalable method for collecting data from sources that cannot use a Splunk forwarder. Enable HEC on indexers or configure a dedicated HEC receiver tier using heavy forwarders.
Indexing Tier Best Practices
Enable parallel pipelines: Take advantage of available system resources by enabling ingest parallelization features, ensuring adequate I/O performance.
Consider using SSDs for hot/warm volumes and summaries: Solid-state drives (SSDs) can significantly improve search performance by removing I/O limitations.
Keep the indexing tier close to the search tier: Minimize network latency between the indexing and search tiers to enhance the user experience during searches.
Use index replication for high availability: Ensure multiple copies of every event in the indexer cluster to protect against search peer failure and meet service level agreements (SLAs).
Ensure good data onboarding hygiene: Explicitly configure data sources, including line breaking, timestamp extraction, timezone, source, source type, and host, to optimize data ingest capacity and indexing latency.
Consider configuring batch mode search parallelization: Enable search parallelization features on indexers with excess processing power to improve search performance.
Monitor for balanced data distribution: Ensure even event distribution across the indexer nodes to maintain optimal search performance and proper data retention policy enforcement.
Disable the web UI on indexers in distributed deployments: There is no need to access the web UI directly on indexers in clustered environments.
Use Splunk pre-built Technology Add-Ons: Leverage Splunk-provided add-ons for well-known data sources to ensure optimal configuration and faster time to value.
Monitor critical indexer metrics: Utilize the Splunk monitoring console to track key performance metrics, including CPU and memory utilization, and detailed metrics of internal Splunk components.
Search Tier Best Practices
Keep the search tier close to the indexing tier: Minimize network delays between the search and indexing tiers to optimize search performance.
Utilize search head clustering for scalability: Implement search head clustering to replicate user artifacts, enable intelligent search workload scheduling, and provide high availability.
Forward all search head internal logs to the indexing tier: Store all indexed data, including search head logs, on the indexing tier to simplify management and eliminate the need for high-performing storage on the search head tier.
Consider using LDAP authentication: Implement centrally managed user identities using LDAP for simplified management and enhanced security.
Ensure sufficient CPU cores for concurrent searches: Allocate enough CPU cores to handle concurrent search needs and avoid search queuing and delays.
Utilize scheduled search time windows: Provide time windows for scheduled searches to run, helping to avoid search concurrency hotspots.
Limit the number of distinct search heads/clusters on the same indexing tier: Carefully plan the number of standalone search heads and search head clusters to prevent overloading the indexer tier with concurrent search workload.
Use an odd number of nodes when building search head clusters: Ensure that search head clusters have an odd number of nodes (3, 5, 7, etc.) to facilitate majority-based captain election and prevent split-brain scenarios during network failures.
By following these best practices for the data collection, indexing, and search tiers, you can optimize your Splunk deployment for availability, performance, scalability, security, and reliability. Remember to regularly monitor and adjust your configuration as your data volumes and requirements evolve over time.
We hope this article helps understand the Splunk architecture with different design topology. We are going to end this article for now, we will cover more information about the Splunk in the up coming articles. Please keep visiting thesecmaster.com for more such technical information. Visit our social media page on Facebook, Instagram, LinkedIn, Twitter, Telegram, Tumblr, & Medium and subscribe to receive information like this.
You may also like these articles:
Arun KL
Arun KL is a cybersecurity professional with 15+ years of experience in IT infrastructure, cloud security, vulnerability management, Penetration Testing, security operations, and incident response. He is adept at designing and implementing robust security solutions to safeguard systems and data. Arun holds multiple industry certifications including CCNA, CCNA Security, RHCE, CEH, and AWS Security.