Questions tagged [system-reliability]
The system-reliability tag has no usage guidance.
23
questions
2
votes
2
answers
195
views
When should I be worried of Time of check time of use vulnerabilities during database queries?
I am having difficulties on understanding when I should be worried about TOCTOU vulnerabilities and how to avoid them because yes, we can use database transactions but there are different level of ...
3
votes
1
answer
386
views
Defining SLI / SLO for ETL and Reporting Application
All,
We're just started on SRE journey and trying to define SLI / SLO for our application.
It is an ETL application where 1. feeds (e.g. start of day, end of day data feeds) comes from various ...
0
votes
1
answer
242
views
What is the crux of difference between N version programming and self monitoring architecture?
Source-:https://cs.ccsu.edu/~stan/classes/CS410/Notes16/11-ReliabilityEngineering.html
This is self monitoring architecture. So here computations carried across 2 channels, if they both provide same ...
0
votes
2
answers
309
views
Building a program that truly deletes everything
We all know that if we delete a file, the operating system is recycling it but doesn't actually delete it. It just removes it from the directory indexes, and until the data is needed and overwritten, ...
-2
votes
1
answer
156
views
Thoughts of Google Cloud App Engine Reliability
I have been developing an app that will require a cron task every minute.
We are handling our cron tasks with Spring Boot Scheduling. Although, I am a little worried about the following question:
...
2
votes
0
answers
579
views
Running a high availability PostgreSQL cluster on native AWS services only
Backstory: I am unable to use RDS, as I need to install cartridges in my PostgreSQL instances.
I have been trying to pin down an architecture for PostgreSQL running on EC2 instances for a few days. ...
-1
votes
2
answers
4k
views
Testing can detect the presence of error but not the absence of error, why? [closed]
I hear and see this statement almost in every academic book related to software engineering
Testing can detect the presence of errors and not the absence of errors.
But I do not get it clear. ...
3
votes
3
answers
269
views
How to prevent bugs in business-level configurations with similar discipline as in source code?
We have a system that allows our clients to coordinate people (shoppers) so that they can delivery groceries within 45 minutes from the order creation.
Each client has a set of stores where the ...
-2
votes
2
answers
132
views
How do I ensure my product is correct the first time? [closed]
I am working on a product that will not be able to be updated once released. Furthermore, if the product malfunctions, the results may include death, serious bodily harm, or major financial setbacks. ...
0
votes
0
answers
53
views
Do logs enhance availability of service for a well monitored application?
I used to work for teams that built software as a service applications. Our requirements, regarding production, were often the same :
A complex service (web application, database, daemons, typically) ...
0
votes
1
answer
154
views
How to ensure that every log event will be delivered to the GrayLog
In our applications we traditionally log events locally into the logging files. As our applications are distributed on multiple server instances, searching for particular events are complicated and ...
0
votes
3
answers
1k
views
Dealing with master failure in a master-slave DB setup
I'm learning about system design for the first time and am really intrigued by reliability. Given a setup where you have a master that replicates and writes data through to a slave, how do you persist/...
7
votes
3
answers
9k
views
Best practices for Heartbeat in distributed systems
We had in our system in the past an external data provider (call it source) sending regular heartbeats to a java application (call it client). If the heartbeat failed, system shut itself down (to ...
1
vote
1
answer
578
views
Reliability for FTP Server
We have a Ftp server implemented. The manager wants to add reliability to it. He wants me to write incoming streams into some fast and reliable system (like hbase or redis) before writing them to ...
-6
votes
1
answer
129
views
Reliable installer [closed]
I have just brutally terminated Skype installer, which now gives me error 1603 whenever I try to restart the installaion. This brings be back to the issue that bothers me for a long time: how do you ...