Skip to main content

Questions tagged [system-reliability]

The tag has no usage guidance.

2 votes
2 answers
195 views

When should I be worried of Time of check time of use vulnerabilities during database queries?

I am having difficulties on understanding when I should be worried about TOCTOU vulnerabilities and how to avoid them because yes, we can use database transactions but there are different level of ...
Alessandro's user avatar
3 votes
1 answer
386 views

Defining SLI / SLO for ETL and Reporting Application

All, We're just started on SRE journey and trying to define SLI / SLO for our application. It is an ETL application where 1. feeds (e.g. start of day, end of day data feeds) comes from various ...
Ravi Parekh's user avatar
0 votes
1 answer
242 views

What is the crux of difference between N version programming and self monitoring architecture?

Source-:https://cs.ccsu.edu/~stan/classes/CS410/Notes16/11-ReliabilityEngineering.html This is self monitoring architecture. So here computations carried across 2 channels, if they both provide same ...
cuajiu's user avatar
  • 9
0 votes
2 answers
309 views

Building a program that truly deletes everything

We all know that if we delete a file, the operating system is recycling it but doesn't actually delete it. It just removes it from the directory indexes, and until the data is needed and overwritten, ...
VJZ's user avatar
  • 127
-2 votes
1 answer
156 views

Thoughts of Google Cloud App Engine Reliability

I have been developing an app that will require a cron task every minute. We are handling our cron tasks with Spring Boot Scheduling. Although, I am a little worried about the following question: ...
Juan's user avatar
  • 3
2 votes
0 answers
579 views

Running a high availability PostgreSQL cluster on native AWS services only

Backstory: I am unable to use RDS, as I need to install cartridges in my PostgreSQL instances. I have been trying to pin down an architecture for PostgreSQL running on EC2 instances for a few days. ...
tjwoon's user avatar
  • 29
-1 votes
2 answers
4k views

Testing can detect the presence of error but not the absence of error, why? [closed]

I hear and see this statement almost in every academic book related to software engineering Testing can detect the presence of errors and not the absence of errors. But I do not get it clear. ...
Deepam Gupta's user avatar
3 votes
3 answers
269 views

How to prevent bugs in business-level configurations with similar discipline as in source code?

We have a system that allows our clients to coordinate people (shoppers) so that they can delivery groceries within 45 minutes from the order creation. Each client has a set of stores where the ...
Mauricio Rondon's user avatar
-2 votes
2 answers
132 views

How do I ensure my product is correct the first time? [closed]

I am working on a product that will not be able to be updated once released. Furthermore, if the product malfunctions, the results may include death, serious bodily harm, or major financial setbacks. ...
Demi's user avatar
  • 826
0 votes
0 answers
53 views

Do logs enhance availability of service for a well monitored application?

I used to work for teams that built software as a service applications. Our requirements, regarding production, were often the same : A complex service (web application, database, daemons, typically) ...
Diane M's user avatar
  • 2,076
0 votes
1 answer
154 views

How to ensure that every log event will be delivered to the GrayLog

In our applications we traditionally log events locally into the logging files. As our applications are distributed on multiple server instances, searching for particular events are complicated and ...
Tomazz's user avatar
  • 9
0 votes
3 answers
1k views

Dealing with master failure in a master-slave DB setup

I'm learning about system design for the first time and am really intrigued by reliability. Given a setup where you have a master that replicates and writes data through to a slave, how do you persist/...
jonnyd42's user avatar
  • 103
7 votes
3 answers
9k views

Best practices for Heartbeat in distributed systems

We had in our system in the past an external data provider (call it source) sending regular heartbeats to a java application (call it client). If the heartbeat failed, system shut itself down (to ...
senseiwu's user avatar
  • 668
1 vote
1 answer
578 views

Reliability for FTP Server

We have a Ftp server implemented. The manager wants to add reliability to it. He wants me to write incoming streams into some fast and reliable system (like hbase or redis) before writing them to ...
vakarami's user avatar
  • 111
-6 votes
1 answer
129 views

Reliable installer [closed]

I have just brutally terminated Skype installer, which now gives me error 1603 whenever I try to restart the installaion. This brings be back to the issue that bothers me for a long time: how do you ...
Val's user avatar
  • 367

15 30 50 per page