What we have learned from SolarWinds/Fireye ??

SolarWinds/FireEye has shown how we keep failing on basic day-to-day tasks. Backups, Monitoring and Logs, have been three of the most important items on the list of any System Administrator since the beginner and which should be the highest priority in any infrastructure.

As I have already commented countless times:

“Sooner or later they will enter your network. Be it due to human error, a zero day bug or another reason."

And this is not “the problem” (depending on the situation, it is even acceptable), “the problem” is when we do not have the ability to identify the cause (logs and monitoring), recover us (backups), fix it and move on.

But with SolarWinds situation, many companies realized that their backup/logging policy is not correct, because in many cases, they only kept backups and logs for 30 days.

The million dollar question is:

What happened to the standards like NIST/ISO where it recommends to keep all this for a period between 12 to 60 months?

Well, the problem often comes from cost ($$$) issues, since keeping the backups and/or logs of all the servers in the “cloud” for so long period means a very large amount of money every month. Therefore, many companies simply select the default provider’s backup policy and think, “If we have any problem today, we restore yesterday’s backup and that’s it."

They ignore (or are unaware) that a threat can be within the company for a long time without being detected and generating small but continuous damage (cases of hackers or internal employers that have deleted one register in the database by day isn’t rare).

Solutions:

Solutions to this exists since always and in grand variety, but recapping what I have already commented several times in different parts:

Logging:

All servers and devices need to have the same hour (ntp).
All logs are sent to a central server.
Log rotation in central server needs to be configured.
Every system that generates some kind of log will are logged.
Accesses, errors, dumps, updates, reboots, etc. are logged.
Access to the central server are strictly restricted, including the system administrators.

Backups:

Actually, with the virtualization and deduplication it is possible to backup all the VM in a few minutes and use very small disk space. This is very useful because you don’t need to be concerned if a user have saved a file in the correct path or not. In other hand, backup softwares allows the concept of Full, Differential and Incremental backups. This reduces the disk space and time to make the backups on environments without virtualization.

Note: In my case, I use both at the same time: one to backup all the VM (SO and data) and the other to backup files (/etc, /home, /var). Because having multiple backup systems isn’t enough to paranoiac people like me! :-D

So, the basic:

Synchronization or replication AREN’T backup.
If you have a virtualization environment, use a tool to backup all the VM (SO, configuration and Data)
If you have physical servers, use a backup software with the “base backup” ability to backup the OS and the normal backup (Full, Differential and Incremental) to configuration and Data.
Configure your backup plan thinking in the future. Maybe you don’t need to store the backup for two years now, but will be nice if some auditors come and ask you if you have this data, and you said yes.
Configure a cold-backup (LTO, yeah, they still exist), because in some situation the problem can affect all the online systems and you will lost the primary and backup systems.
Your backup system needs to be physically distant from your primary systems (fire, earthquakes, thieves, etc).
Access to backup servers are strictly restricted, inclusive, to system administrators.

Monitoring:

Like the logging, the idea here is to check everything, such as:

CPU, RAM, process, updates in the server
Access (origin, destination, period, errors, dumps)
Disk (smart), hardware (bios, ilo/drac/ipmi, power, reboot)
Temperature, humidity, luminosity, door (open/closed) from the room/DC.
UPS (temperature, load, charge, volts)
Backups systems (number of success/errors)

Final comments:

A cheap and easy alternative to avoid the costs of the clouds is to have a local environment (inside the company) to receive and store all the historic backups/logs in a storage and after in an LTO (disk-to-tape backups).
Sometimes, the backup and logging is almost more important than the real data (video or phone recording, for example) and the size and time of the backups are very big, so, a good alternative is to create an exclusive networking (switches and NIC ) to backup and logging.
Don’t forget to schedule and “execute” tests over your Backup and Recovery plan to certificate that the recovery will work well when you need it.

Sooner or later they will enter in our network