> Products > Network Management > Lateral Data Processing System
Distributed Packet Capture Using Lateral Data Processing
For customers whose storage needs (and necessarily processing needs) are in excess of 100TB, a distributed approach using our Lateral Data Processing system provides the power and capacity they need with multiple chassis working in concert to achieve a common goal. It is no longer necessary to use proprietary hardware to accomplish what many smaller machines can do together. The Lateral Data Processing system relies on off-the-shelf commodity-type hardware to achieve exceptional performance through use of a higher number of chassis, perhaps with lower processing capacity per chassis, but higher processing capacity collectively. The benefits of this approach are clear: customers are free to realize all the benefits of commodity hardware in terms of maintenance, support and cost, and, at the same time, control the amount of processing power and storage capacity needed by varying the number of chassis and the capacity per chassis.
Lateral Data Processing utilizes three types of machines in a layered approach, leveraging their aggregate storage and processing power to capture large amounts of data at high speeds and, at the same time, deliver large amounts of data to the user at high speeds, whether as reports, aggregated packet data, raw packets or data analyzed and organized in some other fashion. Adding to the appeal is that the system is expandable -- it caters to the customer’s needs now and in the future through the ability to grow, adapt and customize.
Why Terabit processing when capturing in Gigabits?
You could say it is all about how long you want to wait for results. One month encompasses over 43,000 minutes of data capture. If you have a system that searches and retrieves data at the same rate it captures, you would have to wait one month for a response to a query. To make the system usable for both capture and retrieval, you need it to be able to process data at a speed several multiples over its capture speed. In one minute a system that captures at 10Gbps and processes at 10Tbps can search through 1000 minutes of captured data (less than a day's worth). One way to boost this ratio is to search just headers, which multiplies the ratio, on average, by about 30, giving you 30,000 minutes of data that can be searched in one minute. This is better, but you are still only covering a little over two-thirds of a month of data in one minute (and not looking into the payloads). The alternative is to use the sophisticated preprocessing capacity of the Lateral Data Processing system. By preconfiguring its 10,000 sorting buckets you can split the data in such a way that searches, for the most part, would only need to be done on a small subset of the total capture and thereby increase the response speed exponentially. In order to further ensure success you have the aggregation and reporting features to narrow down the scope of searches even further and get even speedier results.
Using our Lateral Data Processing system you can link over 1000 chassis with one to four HDDs each and reach an aggregate storage capacity of up to 100,000 Terabytes and an aggregate data processing power up to 10Tbps. The resilience of the system, however, relieves you of the worry that one individual storage machine (which likely will make up the bulk of the array) could bring the whole thing down. If one of the storage machines goes down or has a hard drive that fails, it is merely the data on that particular machine that becomes inaccessible. The overall system continues operating normally.
From the perspective of redundancy, a distributed system using Lateral Data Processing is also more advantageous and less costly. Backing up the machines that capture and distribute the packets requires simply physically duplicating these particular machines and meshing them with the storage machines. The capture and distribute machines constitute a small fraction of the total number of machines in the system (the storage machines are by far the most numerous), meaning functional redundancy is achieved at a fraction of the cost of the whole system.
The Lateral Data Processing System
For the Lateral Data Processing system we provide the software and you source your own compatible hardware (or re-use older hardware rotating out of service). This makes it possible to literally build a supercomputer spread out among many different machines – economically – based on the robust, proven technology that we have been offering to our customers for years.
Today our Lateral Data Processing System can capture data at up to 40Gbps line rates, depending on the quantity and speed of the hardware used. For example, if you have 100 machines with four 16TB HDDs each and 8-core processors, you would have overall storage of 6,400TB and total processing capacity of 800 cores and processing speed of up to 1Tbps. This is how you achieve tremendous responsiveness, even on vast amounts of data. We generally recommend budgeting for processing capacity 1000 times larger than your peak capture rate, so if your network has 1 Mbps bandwidth utilization, you would want to plan for 1 Gbps processing capacity to provide adequate processing for both capture and retrieval tasks.
The software comes on bootable USB sticks that you use for installation onto your own chassis, with no need for a keyboard or monitor (unless you need to change hardware specific settings, e.g., in the BIOS or CMOS). You control the processing speed through your choice of processor and the number of machines in the array. Likewise with the storage capacity, with the option to use up to four hard drives in each machine. One of the amazing capabilities of the technology behind the Lateral Data Processing System is that you can take an old decommissioned machine on which you would struggle to run today’s current operating systems and that same machine would be a valuable member of your distributed packet capture network, delivering outstanding performance with alike chassis together. This low entry cost also makes it possible to deploy multiple systems of varying sizes or configurations, either for different locations, for different purposes, for testbeds or as alternates or backups to each other.
What Can You Do With All this Speed and Capacity?
Up till now packet capture was expensive dead weight that you had to have, just in case. The Lateral Data Processing system’s ability to have aggregates and reports changes that dynamic entirely as a system that helps you streamline your network and reduce inefficiencies. It saves you money, whether by revealing that you don’t actually need to spend that much on bandwidth, or that you have improperly routed traffic creating bottlenecks and slowing things down or buggy servers flooding the network with useless traffic. This is even before you get into tasks like figuring out what firewall rules you need, which used to be a bit of a Frankenstein project, taking on a life of its own. With Lateral Data Processing, it’s just a routine task, checked off the list. You no longer have to waste time figuring out how to get the data, you get it from the packets – it’s all there.
The data the system delivers goes well beyond raw PCAP files, encompassing automatically-generated hourly reports, reports on demand, consolidated reports, graphs, aggregates and fast packet retrieval based on specific parameters for both headers and payload, including keywords and signatures. Data may be processed both in real time and on demand, down to the level of application and device tracking, or even the tracking of a specific application on one specific device. It also goes well beyond just getting data – the system can also generate alerts based on the performance (or lack thereof) of an application or device or on conditions detected within the network traffic or a variety of other scenarios, as well as find patterns and produce intelligence based on what it finds in the data flow. For example, you could set an alert to trigger if bandwidth utilization goes outside of a certain range, except on Mondays from 9PM to midnight, when backup is routinely in process. Dialing it up a notch, in another example the system could detect the beginning of a handshake between a server and client, start measuring the bandwidth of the session, alert if it goes outside of certain parameters and stop measuring when the end of the session is detected. With the amount of processing power and storage the Lateral Data Processing System can encompass, the limits of what you can do with it and the data are pretty much limited by only your imagination.
With Lateral Data Processing you can get full visibility over your network in one report accounting for all packets. Start with this bird’s-eye-view report and then work your way down to more and more specific reports generated based on the contents of packet payloads. A typical office network contains a variety of devices – switches, routers, computers, servers, network-attached storage, printers, industrial controls, smart fridges and so on. The system can immediately identify all the devices in a report through which you would be able to track the appearance and disappearance of new devices. With the ability to see all the traffic, it can distinguish the servers from the clients, who is talking to whom and how much, in an automatic hourly report with the ability to aggregate. This is just one example: the system can go further, whether you need to confirm that your firewall is working or figure out what rules you need for it, debug inoperational and malfunctioning servers and other equipment, detect new or unauthorized devices, check compliance with updates and patching and more, delivered in the form of reports. It saves you the time of having to crawl from packet to packet, one at a time – instead you get a report covering billions and trillions of packets in presentable form.
What's the advantage of looking at headers and payloads?
Most systems derive reports and organize packets from the headers. Lateral Data Processing goes a giant step further in also examining payloads. Headers are great — we examine them, too — however, looking at the payload makes it possible to understand the traffic and its contents. Rather than just seeing where the packets are traveling to and from, with Lateral Data Processing you can also see what is in them. Without the ability to look into the packets, the task of finding which of your thousand computers hasn't checked in with and downloaded from the AV update server or is using an outdated browser becomes an onerous task. You could check logs by hand and hope you have a complete list of all the machines in your purview. Or with Lateral Data Processing you can have a system that knows both all the devices present on the network, can identify whether they are clients or servers (or something else), and tell you which clients failed to download anything from the update server. If you only have a dozen computers to manage, doing it any which way is possible, however, when they number in the hundreds or thousands you need the automation that Lateral Data Processing provides.
With its ability to look into the payloads, the system derives more useful information than what you can get from the headers alone. For example, malicious software often uses a certain browser ID string. Configure a report giving you all the browser ID strings detected mapped out to specific IP or MAC addresses. In another example, say you want to find out how many SSL 3.0 clients (not servers) have been active in the past 24 hours. Configure a report for that purpose and off it goes. Better yet, let the system check for it continuously and you’ll get up-to-the-minute rolling reports to check whenever you like. You can even graph the data to get a picture of how it changed over time. Same with checking compliance with updates and patching and a wide variety of other tasks too onerous or impossible to do by hand. With every report configured, you get yet another view of your network. Everyone has different circumstances and needs, and Lateral Data Processing can get the data to address each one.
And you can have the system do all of this and more simultaneously in real time.
The Lateral Data Processing System processes the data in real time, as it comes in, not just now and then when needed to examine a specific chunk of data to extract files. This opens up an entirely new way for users to manage their networks in real time, with real alerts, looking for real patterns as they are developing, not after the fact. It is “actionary” data processing as opposed to reactionary.
This is especially invaluable when debugging and troubleshooting a technical problem, especially those that you could solve if you could only catch when that one particular packet with a certain payload is present. With Lateral Data Processing you can figure out if there is a certain pattern in the packets that precipitates a problem (or cascade of problems) in the network in real time.
Another aspect that sets Lateral Data Processing apart from other systems is configuration, or, rather, how you configure it. This system, the result of years of development, was purpose built and designed for results, with no finicky telnet access or cryptic commands or myriad of settings whose purpose or effect may not be entirely clear. While those command-lines may be the only way to do it when hoping to optimize the performance of an under-powered appliance or of a system of disparate parts glued together by some code from distant off-shore locations, when you need results it is better to focus on getting those results rather than first taking a course and learning a library of commands in order to implement your ideas within the processing restrictions of an under-powered appliance. With our system, even if you don’t configure the most efficient way to conduct a task, that’s ok because you have more than ample processing power to make up for it. Better yet, we provide the service of economically configuring those for you, so you can focus on quick results.
Since Lateral Data Processing is a database-driven system it can be configured for multi-user environments as well, with tiered access control, user management and different interfaces for different types of users.
The bottom line is, the Lateral Data Processing System is complex in what it can do, but not in how you use it. Its distributed nature combined with the comprehensive, sophisticated technology powering it takes a giant leap from a cluster of appliances working together to a large (and easily growable) quantity of machines processing data in concert and lifting the boundaries of what you can do.
Cybersecurity & Artificial Intelligence
You may have noticed that, unlike many others, we haven’t talked much about cybersecurity or artificial intelligence. The reason is simple: the Lateral Data Processing system is for everyday use and most people who use packet capture are most often wrangling with a technical issue, rather than hunting down a malware signature. That is not to say that Lateral Data Processing isn’t applicable to the security side of things – it is. It’s just that most problems, technical or security, boil down to something not working right, and for both you need to go to the packets, what they contain, where they originate and where they go. Philosophically speaking you could say technical issues ARE security issues and vice versa, and in practical terms they are, too. Before you can protect an asset, you need to make sure it is working properly in the first place. Only when you have it working properly can you identify when it isn’t working properly. This is true today and will be true in the future, and with Lateral Data Processing you have a system both for today and tomorrow, even if you don’t yet know all that you will need it for, meaning you can address current and future problems predictably. You know the cost and time needed to expand the system, meaning you can set up another ten machines when and where you need them.
Regarding artificial intelligence, we aren’t going to insult you by suggesting that a computer is smarter than you. It’s not, all it can do is process data faster than you (sometimes, in only specific very narrow circumstances). The system is a tool for you to use, not the other way around. (And do you really want a computer thinking for you? Look at the very mixed success of putting them in cars.) Using the catchphrase “artificial intelligence” or not, the Lateral Data Processing System enhances your ability to get things done. When you have a system where it takes so long to implement a solution that by the time you get results they are no longer relevant you have a solution that is not working for you.
When you have a system with scalable processing power and scalable storage that can examine the whole packet, including payload, aggregate data and process it – and do it very quickly – you have the elements that give you the ability to find the answers you are looking for. The only question is one of configuration.
Ready to get your own Lateral Data Processing System, or just want to learn more about how it can work for you? Contact us at sales@ipcopper.com and tell us what you need to do.
Deployment of a Lateral Data Processing System generally takes the following three steps:
- You engage us assess your capacity and storage needs and learn how you want to use the system. During this time we will also work with you to settle on the parameters of the hardware you would like to use and pre-test them for compatibility.
- You get a working prototype on the chosen hardware.
- We work with you to adjust how the system processes data (to meet your stated network wants or needs), finalize the user interface and implement user management features.
Deployment to your hardware is easy. You will receive a bootable USB stick from us to configure one machine from which you will generate the USB sticks you need to boot up the other machines. As mentioned above, no keyboard or monitor is required, unless you have to make changes to your hardware’s BIOS or CMOS.
Once your system is deployed we will work with you over a certain period of time to answer questions about configuration and provide limited adjustments within the scope of the project. Even if you find later that you need a particular feature or way of processing data that was not included in the project originally, we can work with you to come up with a solution using your currently deployed system (as mentioned above, the system is scalable and flexible – it can grow with you size-wise, speed-wise and function-wise). Since this is our own system (not an amalgamation of off-shored bits and pieces of code), we are more than able to quickly make those adjustments and changes, which could bring large increases in efficiency.