Table of Contents
  • Home
  • /
  • Blog
  • /
  • What is a Splunk Index? How to Create a New Splunk Index via the Web UI and CLI?
May 9, 2024
|
14m

What is a Splunk Index? How to Create a New Splunk Index via the Web UI and CLI?


What is a Splunk Index and How to Create a new Splunk Index via the web UI and CLI

In this post, we are diving into one of the most crucial components of Splunk: the 'Indexer', where the magic of event indexing happens. If you recall the Data Input, Data Store, and Data Search phases of the data pipeline, or the Event Processing Tiers of Splunk, we will be focusing on the Data Storage phase, where parsing and indexing of input data take place.

Throughout this article, we will answer fundamental questions about the indexing process, such as:

  • What exactly is an Index?

  • How does Splunk store indexes?

  • How to create and delete indexes using the Web UI and CLI

  • Adding and removing data from indexes

  • Taking the backup indexes

By the end of this article, you will have a solid grasp of Splunk indexes and how to manage them effectively to optimize your Splunk environment. So, let's get started on this exciting journey of exploring Splunk indexes!

What is an Index?

In Splunk, an index is a repository for data that is going to be stored in an indexer. The Splunk instance is configured to index local and remote data, which can then be searched through a search app. Indexes live under the var/lib/splunk directory by default.

Splunk comes with multiple preconfigured indexes. The "main" index is the first index you should know about, because it is the default index Splunk stores the incoming data. However, you have the flexibility to create and specify other indexes for different data inputs, allowing you to organize your data more effectively. Scroll down if you want to know how to create a custom index. We have covered both Web UI and CLI procedures.

There are a few more indexes comes as preconfigured along with the index "main". Here you see:

  1. _internal: This index stores the logs resulting from the internal processing of the Splunk instance. For example, Splunk daemon logs are stored under _internal.

  2. _audit: This index stores the audit trail logs and any other optional auditing information.

  3. _introspection: This index is used for system performance tracking, such as Splunk resource usage and data parsing performance. It contains information related to CPU and memory usage.

  4. _thefishbucket: This index contains checkpoint information for all the files that are going to be monitored for data input. This is relevant for both forwarders and indexers.

  5. main: The main index is the default index for data input and is located under the defaultdb directory.

You might wonder why we would want to create many indexes instead of storing all data in a single index. The answer lies in the following advantages:

  1. Fast search retrieval: By segregating data types into different indexes, you can avoid putting an extra load on the Splunk instance to search through all the data when you only need to search for a specific data type, such as firewall logs.

  2. Retention management: Based on the customer's requirements and data importance, you can set different retention periods for each index. For example, you may want to keep security logs for 12 months, while web logs might only be kept for 3 months.

  3. Access control: You can control who has access to what data by setting permissions at the index level. For instance, you can grant the security group access to the security logs index while restricting access for other departments.

How Does Splunk Store Indexes?

Now that you know what indexes are, let's dive into how Splunk actually stores these indexes. When data is fed into Splunk, the indexer processes it and stores it in a series of directories and files, collectively known as an Indexes.

Under the hood, each index is comprised of a set of directories called "buckets." These buckets are organized by age, with the most recently indexed data stored in the "hot" bucket, slightly older data in the "warm" bucket, and the oldest data in the "cold" bucket. This age-based organization helps Splunk efficiently manage the data lifecycle and optimize performance.

Here's a quick breakdown of the bucket types:

Hot Buckets: When new data arrives, it is first written to hot buckets. These buckets are always open for writing and are optimized for fast data ingestion. Hot buckets have the naming format hot_v1_<GUID>, where <GUID> is a unique identifier. Well, hot buckets are stored in the $SPLUNK_HOME/var/lib/splunk/[index_name]/db/ directory.

Warm Buckets: As hot buckets reach certain thresholds (e.g., maximum size, age, or number of buckets), data is rolled over to warm buckets. Warm buckets are read-only and are optimized for searching. The naming format for warm buckets is db_<timestamp>_<GUID>, where <timestamp> represents the earliest and latest timestamps of the events within the bucket. Worm buckets also share the same location as hot buckets in the directory hierarchy.

Cold Buckets: When warm buckets reach their respective thresholds, data is moved to cold buckets. Cold buckets are also read-only and are typically stored on slower, less expensive storage media. The naming format for cold buckets is the same as warm buckets, but they are stored in a separate directory called colddb. Path: $SPLUNK_HOME/var/lib/splunk/[index_name]/colddb

Thawed Buckets: When data needs to be restored from an archive, it is moved back into thawed buckets.

Within each bucket, Splunk stores the indexed data in a compressed format, along with associated metadata files that help facilitate fast searching and retrieval. These files include:

  • .data files: Contain the raw indexed data

  • .tsidx files: Contain indexes that point to the raw data, enabling fast searching

By default, Splunk will automatically manage the movement of data through the bucket lifecycle based on configurable policies. However, you can also manually manage buckets in the indexes.conf file.

Let's see the key factors, criteria and thresholds that influence data movement:

  1. Hot to Warm: Data rolls from hot to warm buckets when the maximum size, age, or number of hot buckets is reached.

  2. Warm to Cold: Data moves from warm to cold buckets when the maximum number of warm buckets is reached.

  3. Cold to Frozen: Data transitions from cold to frozen buckets when the cold bucket storage limit is exceeded, or the retention period is reached.

The specific thresholds and criteria for data movement are defined using the following settings in indexes.conf:

  • maxHotBuckets: The maximum number of hot buckets.

  • maxHotSpanSecs: The maximum age of hot buckets in seconds.

  • maxHotIdleSecs: The maximum idle time for hot buckets before rolling to warm.

  • maxDataSize: The maximum size of a bucket.

  • maxTotalDataSizeMB: The maximum total size of an index.

  • frozenTimePeriodInSecs: The retention period for the entire index.

How to Create and Delete Indexes Using the Web UI?

Now that you have a solid understanding of what indexes are and how Splunk stores them, let's explore how you can create and delete indexes using the Splunk Web UI. The Web UI provides a user-friendly interface for managing indexes, making it easy for you to organize your data and keep your Splunk environment tidy.

Creating an Index Using the Web UI

  1. Log in to your Splunk Web UI and navigate to "Settings" > "Indexes".

  1. Click on the "New Index" button.

  2. Enter a name for your new index (e.g., "security_logs").

  1. (Optional) Specify the index type as "Events" or "Metrics". For most use cases, "Events" is the default selection.

  2. (Optional) Set the "App" field to specify which app this index should be associated with. This helps with organizing and managing indexes.

  3. (Optional) Configure the following settings based on your requirements:

  • "Max Size": The maximum size of the entire index.

  • "Max Hot Buckets": The maximum number of hot buckets.

  • "Max Warm Buckets": The maximum number of warm buckets.

  • "Max Cold Buckets": The maximum number of cold buckets.

  • "Retention Period": The time period for which data should be retained in the index.

4. (Optional) Specify the "Home Path", "Cold Path", and "Thawed Path" if you want to store the index data in custom locations.

5. Click on the "Save" button to create the new index.

Your new index is now created and ready to receive data. You can start forwarding data to this index by configuring data inputs or forwarders.

Note: If in case you don't see your index in the list, got to https://splunkhost:8000/en-US/debug/refreshreload the configuration.

Deleting an Index Using the Web UI

  1. Log in to your Splunk Web UI and navigate to "Settings" > "Indexes".

  2. Locate the index you want to delete from the list of indexes.

  3. Click on the "Delete" button next to the index you want to remove.

  4. A confirmation dialog will appear. Click "Delete" to confirm the action.

Please note that deleting an index will permanently remove all data associated with that index. Make sure you have backed up any important data before proceeding with the deletion.

How to Create and Delete Indexes Using CLI in Linux and Mac?

While the Splunk Web UI provides a convenient way to manage indexes, sometimes you may need to work with indexes from the command line. Whether you're automating index management tasks or simply prefer the flexibility of the CLI, Splunk has got you covered. In this section, we'll walk through the steps to create and delete indexes using the Splunk CLI on Linux and Mac operating systems.

Creating an Index

1. Connect to your Splunk instance using SSH or a terminal.

2. Navigate to the $SPLUNK_HOME/etc/apps/<app_name>/local/ directory where the indexes.conf file exist. The default, $SPLUNK_HOME in Linux is /opt/splunk, and in Mac is /Applications/splunk.

3. Navigate to the application underneath you need to create an index. Applications are stores underneath $SPLUNK_HOME/etc/apps/. We are going to create an index inside the "search" app.

cd /Applications/splunk/etc/apps/search/local/

If in case you want to create an index underneath a new application. You should create a new directory SPLUNK_HOME/etc/apps/ and inside the directory again create local directory.

Note: Read How to Create Apps and Add-Ons in Splunk in this article.

  1. Edit the file named indexes.conf or create a new one underneath $SPLUNK_HOME/etc/apps/<app_name>/local/ using a text editor (e.g., vi or nano):

The content of indexes.conf something looks like in the above picture. Each stanza represents an index. We have two indexes macbookpro and security_logo in the indexes.conf file. Let's create another one.

[audit_logs]   
coldPath = $SPLUNK_DB/audit_logs/colddb
homePath = $SPLUNK_DB/audit_logs/db
maxTotalDataSizeMB = 51200
thawedPath = $SPLUNK_DB/audit_logs/thaweddb

We created another index named "audit_logs" underneath "search" application. Your new index is now created and ready to receive data. You can start forwarding data to this index by configuring data inputs or forwarders.

  1. Restart the Splunk or debug the configuration here: https://splunkhost:8000/en-US/debug/refresh. If everything goes perfectly, your index should be listed in the web console.

Deleting an Index from CLI

  1. Connect to your Splunk instance using SSH or a terminal.

  2. Navigate to the $SPLUNK_HOME/bin directory:

  3. Stop the Splunk instance: ./splunk stop

  4. Navigate to the $SPLUNK_HOME/etc/apps/<app_name>/local/ directory where the indexes.conf file exist. The default, $SPLUNK_HOME in Linux is /opt/splunk, and in Mac is /Applications/splunk.

  5. Remove the index stanza from the indexes.conf file.

  6. Start the Splunk instance: ./splunk start

The index and its associated data are now permanently deleted from your Splunk instance.

How to Add and Remove Data from Indexes?

Splunk provides various methods to add and remove data from indexes. In this section, we will explore how to add data to an index using the Web UI and remove data from an index using the CLI.

Adding Data to an Index Using the Web UI

  1. Log in to your Splunk Web UI.

  2. Navigate to "Settings" > "Add Data".

  1. Choose the method you want to use to add data, such as "Upload", "Monitor", "Forward", or "TCP/UDP". For this example, let's select "Upload". We will cover other methods in a different articles.

  1. Click on "Select File" and choose the file you want to upload.

  2. Configure the "Source Type" and "Host" fields according to your data.

  3. In the "Index" dropdown, select the index where you want to store the uploaded data.

  4. Click on the "Review" button to review your settings.

  5. If everything looks correct, click on "Submit" to start the data upload.

Splunk will now process and index the uploaded data, making it available for searching and analysis.

Removing Data from an Index Using the CLI

1. Connect to your Splunk instance using SSH or a terminal.

2. To remove data from an index, you can use the splunk remove command followed by the index name and the specific data you want to remove.

3. For example, to remove data from the "my_index" index based on a search query: This command will remove all data from the "my_index" index where the source matches "/path/to/file.log".

./splunk remove index="my_index" source="/path/to/file.log"

This command will remove all data from the "my_index" index where the source matches "/path/to/file.log".

  1. You can also use time-based searches to remove data within a specific time range: This command will remove data from the "my_index" index that is older than 30 days.

./splunk remove index="my_index" earliest=-30d@d latest=now

This command will remove data from the "my_index" index that is older than 30 days.

1. After issuing the splunk remove command, Splunk will display a summary of the data to be removed and prompt for confirmation.

2. Review the summary and, if satisfied, type "yes" to proceed with the data removal.

Please note that removing data from an index is a permanent action and cannot be undone. Make sure to carefully review the data being removed before confirming the operation.

Removing Data from an Index using the Web UI

  1. Perform a search that matches the data you want to remove.

  2. In the search results, click on the "Event Actions" dropdown and select "Delete Events".

  3. Confirm the deletion by clicking "Delete" in the pop-up window.

In this demo, we have logs from two hosts: Linux & Mac. Let's delete the events of Linux by filtering the Linux logs.

This method is suitable for removing smaller subsets of data based on specific search criteria.

It's important to exercise caution when removing data from indexes, as it can impact search results and any associated reports or dashboards. Always ensure that you have appropriate backups and consider the implications of data removal before proceeding.

What If You Can't Delete?

By default any account including admin doesn't have "can_delete" permission. You should add "can_delete" Role to the user to delete events from the search app.

Go to Settings -> Users -> Edit -> Assign Role -> can_delete

How to Take the Backup of Indexes?

Taking regular backups of your Splunk indexes is crucial to ensure data protection and recoverability in case of system failures, data corruption, or accidental deletions. In this section, we will walk you through the step-by-step process of taking backups of your Splunk indexes.

Step 1: Identify the Indexes to Backup

  1. Make note of the index names and their corresponding paths (e.g., $SPLUNK_DB/index_name/db). The default, $SPLUNK_DB in Linux is /opt/splunk/var/lib/splunk/[index_name]/db/, and in Mac is /Applications/splunk/var/lib/splunk/[index_name]/db/.

Step 2: Stop the Splunk Instance

1. Connect to your Splunk instance using SSH or a terminal.

2. Navigate to the $SPLUNK_HOME/bin directory:

3. Stop the Splunk instance: Stopping the Splunk instance ensures that no data is being actively written to the indexes during the backup process.

./splunk stop

Step 3: Back Up the Index Directories

1. Use the cp command to copy the index directories to a backup location: Replace index_name with the actual name of the index you want to back up and /backup/location/ with the path where you want to store the backup.

2. Repeat the above step for each index you want to back up.

cp -R ../var/lib/splunk/index_name /backup/location/

Step 4: Start the Splunk Instance

  1. After completing the backup, start the Splunk instance: This will resume normal operation of your Splunk instance.

./splunk start

Step 5: Verify the Backup

  1. Navigate to the backup location where you copied the index directories.

  2. Verify that the index directories and their contents are present and complete.

  3. You can also compare the size and timestamp of the backed-up directories with the original index directories to ensure the backup was successful

We recommend to automate the backup process by scheduling using cron or at schedulers.

Wrap Up

We hope this article helps understanding the fundamentals of Splunk indexes, including what they are, how they store data, and how to manage them effectively. We covered creating and deleting indexes using both the Web UI and CLI, adding and removing data from indexes, and taking backups to ensure data protection. By understanding and implementing these concepts, you can optimize your Splunk deployment, ensure data accessibility, and maintain a robust data management strategy. Happy Splunking!

We are going to end this article for now, we will cover more information about the Splunk in the up coming articles. Please keep visiting thesecmaster.com for more such technical information. Visit our social media page on Facebook, Instagram,  LinkedInTwitterTelegramTumblr, & Medium and subscribe to receive information like this. 

Arun KL

Arun KL is a cybersecurity professional with 15+ years of experience in IT infrastructure, cloud security, vulnerability management, Penetration Testing, security operations, and incident response. He is adept at designing and implementing robust security solutions to safeguard systems and data. Arun holds multiple industry certifications including CCNA, CCNA Security, RHCE, CEH, and AWS Security.

Recently added

Cloud & OS Platforms

View All

Learn More About Cyber Security Security & Technology

“Knowledge Arsenal: Empowering Your Security Journey through Continuous Learning”

Cybersecurity All-in-One For Dummies - 1st Edition

"Cybersecurity All-in-One For Dummies" offers a comprehensive guide to securing personal and business digital assets from cyber threats, with actionable insights from industry experts.

Tools

Featured

View All

Learn Something New with Free Email subscription

Subscribe

Subscribe