Everybody knows the lock keyword, but how does it implemented? What are its performance characteristics. Gael Fraiteur scratches the surface of multithreaded programming in .NET and goes deep through the Windows Kernel down to CPU microarchitecture.
The document discusses bringing concurrency to Ruby. It begins by defining concurrency and parallelism, noting that both are needed but platforms only enable parallelism if jobs can split into concurrent tasks. It reviews concurrency and parallelism in popular Ruby platforms like MRI, JRuby, and Rubinius. The document outlines four rules for concurrency and discusses techniques like immutable data, locking, atomics, and specialized collections for mutable data. It highlights libraries that provide high-level concurrency abstractions like Celluloid for actors and Sidekiq for background jobs.
Mon, August 22, 2:00pm – 2:30pm Youtube: https://youtu.be/OlJZMHLTfuc Description Abstract: Spur is the new memory manager for the Cog virtual machine used by Pharo, Newspeak and Squeak. It features a two generation scavenger garbage collector with an adaptative tenuring policy, lazy become, (transparent) segmented memory, a new 64bit-compatible object-format, ephemerons, pinned objects, a class table, among others. If you're high-level application developer, or a programming amateur, but not a VM expert, but you're interested in understanding these concepts and what is their impact on your day to day development this talk is for you. Bio: Guille Polito is research engineer at CNRS, France. Pharoer since 2010, he participates actively in the Pharo open source community since several years. He currently works on the modularization of Pharo where he does software archeology, refactoring, library rewriting and participates in the Virtual Machine development.
This document discusses a presentation on practical Windows kernel exploitation. It covers the basics of kernel exploitation, common vulnerability classes like write-what-where and use-after-free, techniques for executing code, mitigation technologies, writing Windows kernel exploits for Metasploit, and improving reliability. The speaker works at SecureState researching and developing kernel exploits and is an open source contributor to projects like Metasploit.
This document discusses techniques used by Patchguard, a mechanism in Windows 8.1 and 10 that protects the kernel from modifications. Patchguard uses code obfuscation, anti-debugging tricks, and periodic checksum validation to prevent unauthorized kernel patches. The document outlines various approaches that could be used to bypass Patchguard such as patching the kernel image, hooking functions, modifying checkers, or descheduling the context verification processes used by Patchguard. It provides details on specific functions and methods involved in Patchguard's context verification and suggests ways these could be descheduled to bypass the mechanism.
This document summarizes a presentation about Python threads and the Global Interpreter Lock (GIL). It discusses how the GIL provides synchronization for Python's memory management but prevents true concurrency. It describes how the GIL is implemented and how it is managed through "checks" and tick counts. It also covers how the new GIL in Python 3.2 aims to improve fairness on multicore systems through a timeout mechanism. Finally, it discusses alternatives like Jython, multiprocessing, and Stackless Python that can better utilize multiple CPU cores.
This document discusses the pitfalls and limits of dynamic malware analysis. It summarizes that dynamic analysis aims to observe malware execution but is challenging due to evasion techniques. Several problems are outlined, including the difficulty of scalability, isolation, and stealth when analyzing malware. The document also discusses issues with using debuggers, emulators, and hypervisor introspection for dynamic analysis. It notes that complete stealth is not feasible and that halting and evasion problems cannot be fully solved.
EMET is Microsoft's tool to make exploits more difficult by adding protections like DEP and ASLR. However, the document outlines how an attacker could bypass EMET protections and disable them using return-oriented programming. The attacker identifies a vulnerability in Firebird that provides enough space for payload. Despite EMET protections, the attacker is able to craft a 19 gadget ROP chain to dynamically resolve EMET's base address and modify configuration offsets to disable protections. A demonstration shows it working to exploit a system with EMET installed. Later EMET versions addressed this by storing configuration in randomized memory.
The document discusses how JRuby pushes the Java platform further by implementing custom core classes like Array, Hash, String, and IO to match Ruby's behavior exactly. It also describes how JRuby uses libraries like ByteList, Joni, Java Native Runtime, and FFI to provide Ruby-like regular expressions, native I/O, and OS-level features on the JVM. These custom implementations and libraries allow JRuby to overcome challenges like dynamic typing and provide a full-fledged Ruby environment atop the Java Virtual Machine.
The winning entry titled "Most competitive" leverages complex Ruby tricks to obfuscate a program that evaluates itself. It uses techniques like dynamically generating code strings, manipulating character encodings, and exploiting edge cases in Ruby's parsing and evaluation rules. The goal is to demonstrate both the robustness of Ruby interpreters in running such esoteric code, as well as uncover subtle aspects of Ruby's specification and implementation. The judges awarded it high honors for achieving the contest's goals of producing transcendental, imbroglio code.
A basic Introduction to Rust. Rust is a modern system programming language which offering different approach than other existing new modern system programming languages to deal with: memory safety without GC, abstraction without overhead, and concurrency without data races.
XCon 2014 => http://xcon.xfocus.org/ In the past was quite common to exploit heap / pool manager vulnerabilities attacking its internal linked structures. However current memory management improve a lot and at current date it is quite ineffective to attack heap in this way. But still those techniques come into hand when we start to looking at linked structures widespread throughout kernel that are unfortunately not hardened enough. In this presentation we will examine power of these vulnerabilities by famous example “CVE – 2013 - 3660”. Showing bypass on ‘lazy’ assertions of _LIST_ENTRY, present exploitation after party and teleport to kernel.
The document describes the SAYEH processor, which is designed for educational purposes. It has a minimum hardware configuration with enough operations possible. The processor has an 16-bit instruction set architecture and uses a register file, program counter, arithmetic logic unit, and controller to process instructions. The controller implements eleven states to control the fetch, decode, execute, and halt operations of the processor.
This document discusses efficient logging in multithreaded C++ servers. It describes the muduo logging library which can log over 1 million messages per second with low latency. The key aspects are an efficient LogStream frontend, asynchronous backend using double buffering to pass log messages from threads to a log writer thread without blocking, and writing to local files for performance and reliability.
The document describes the design of an analog CMOS-based chip that implements an Interval Type-2 Fuzzy Logic Controller (IT2 FLC). The chip takes a realization approach that averages the outputs of two underlying Type-1 Fuzzy Logic Systems (T1 FLSs). The chip has been designed and simulated using a 180nm CMOS technology. It is designed to have two inputs, one output, and nine tunable fuzzy rules. Simulation results show the chip can operate at 20 million fuzzy logic operations per second while consuming 20mW of power.
IC Mask Design - The Experts in all aspects of IC Layout. DAC Conference, Anaheim, June 2010 IC Layout Acceleration Tool - HiPer DevGen
1. Transistor size has reduced from 180nm in 1999 to 32nm in 2009 through advances in silicon technology and new manufacturing processes every 2 years. 2. FinFET transistors provide a 37% performance increase at low voltages and 50% power reduction at constant performance through their 3D structure. 3. Graphene is predicted to replace semiconductors for further transistor shrinkage beyond 2030 due to its 2D structure, high conductivity, and low cost. 4. Quantum computing, which is still in development, will use quantum properties of single atoms to represent data and perform certain algorithms not possible with classical computers.
This document discusses analog integrated circuit layout techniques. It covers topics like design rules, unit component design, boundary condition matching using common centroid layout, and reducing mismatches. Specific circuit elements discussed include MOS transistors, capacitors, and resistors. Layout techniques to reduce parasitic capacitances and handle over-etching errors are presented. Fingered device layouts and calculating parasitic capacitances of MOS devices are also summarized.
This document discusses important considerations for analog integrated circuit layout and the CMOS fabrication process. It covers topics like MOS transistor operation, analog signal characteristics, CMOS fabrication steps, layout techniques for minimizing noise and mismatches, and avoiding latch-up issues. The key goals of analog layout include matching devices, minimizing parasitic capacitance and resistance, isolating analog and digital sections, and using guard rings and decoupling capacitors.
The document discusses the shift to 3D integrated circuit structures and the manufacturing and process control challenges involved. It describes how 3D NAND flash memory uses a vertically stacked structure to increase density in a cost-effective manner. Implementing FinFET transistors also builds vertically by using fin-shaped gates on three sides to improve performance. Significant challenges include precise control over multiple thin film depositions and complex etch processes needed for these 3D structures. Advanced metrology and inspection is required to monitor critical dimensions, material properties, defects and other parameters in three dimensions.
The document provides instructions for completing a lab on designing an inverter using Cadence Virtuoso. The lab objectives are to: 1. Create a library called "myDesignLib" and attach it to the gpdk180 technology library. 2. Build a schematic for an inverter cell and generate its symbol. 3. Create a test design called "Inverter_Test" that instantiates the inverter, and run analog simulations and parametric analysis on the design using Spectre.
The document describes the full custom IC design implementation of a low power priority encoder. It discusses priority encoders, their applications such as in keyboard encoding and interrupt requests. It then covers the full custom design flow used for the priority encoder implementation including schematic capture, simulation, layout design, design rule checking and post-layout simulation. Circuit designs for logic gates and priority encoders of different sizes are presented along with their Cadence simulation results.
This document discusses the design of an ATM simulator software project. It describes the iterative development process, including initial requirements, use case modeling, class modeling, and state diagrams. The first iteration focuses on a basic ATM engine and console interface that supports withdrawal and balance inquiry transactions without a graphical user interface or bank integration. Subsequent iterations will expand functionality and improve the user interface.
This document compares the layout design of a CMOS AND gate using two approaches: fully automatic and semicustom. In the fully automatic approach, the AND gate schematic is developed in DSCH and compiled in MICROWIND to automatically generate the layout. This layout consumes 43.7 μm2 of area and 3.1 μW of power. In the semicustom approach, the layout is manually designed in MICROWIND for area optimization. This layout consumes only 11.2 μm2 of area while consuming similar power as the automatic design. Simulation results show that the semicustom layout reduces area consumption significantly compared to the fully automatic layout, though it may consume slightly more power.
This document summarizes the key principles of VLSI design methodologies discussed in a lecture. It covers four phases in chip creation and how design complexity outpaces productivity. It then discusses how tools and methodologies address this by using abstraction and constraints to reduce complexity and increase productivity. The principles of structured design techniques like hierarchy, regularity, modularity, and locality are explained as ways to decompose and organize a design.
The document describes an experiment to generate and simulate a CMOS inverter circuit layout using the Microwind CAD tool. The key steps are: 1. Select a foundry process and design the nMOS transistor by adding n-well, n+ diffusion, polysilicon, and metal contacts. 2. Design the pMOS transistor by adding an n-well, p+ diffusion, polysilicon, and contacts. 3. Interconnect the pMOS and nMOS transistors to form an inverter, connecting inputs, outputs, and power terminals. 4. Perform DRC checks and post-layout simulation to verify the inverter's transfer characteristics.
The document discusses the history and development of transistors from their invention in 1947 to modern 3D transistors. It describes how Moore's Law of transistor scaling led to the development of 3D tri-gate transistors to overcome limitations of planar transistors. The document explains how 3D transistors provide better performance than planar transistors through conducting channels on three sides of a vertical fin structure. It discusses the construction, operation, benefits and challenges of integrating 3D transistors into mainstream manufacturing.
This document describes formal verification of a pipelined CISC microprocessor modeled after the Intel IA32 instruction set using the UCLID term-level verifier. The objective was to understand UCLID's strengths and weaknesses for modeling hardware designs and the verification process. A pipelined Y86 processor implementation from a textbook was verified against its sequential reference model. The control logic was automatically translated to UCLID format. Modularity and automation were emphasized to maintain model fidelity during verification.
This document discusses RISC processors and compares them to CISC processors. It covers the history of RISC, including the development of RISC concepts in the 1970s. The key differences between RISC and CISC are that RISC uses fixed-length instructions that perform in one clock cycle, while CISC has variable-length instructions that may take multiple cycles. The document also outlines RISC design principles like simple instructions, register-to-register operations, and large register sets. Examples of popular RISC architectures like MIPS, SPARC, and ARM are provided.
The document discusses layout design rules for integrated circuits. It provides guidelines for feature sizes and spacings to ensure fabricated circuits meet intended designs. This includes minimum line widths, separations between layers, and allowances for misalignment. The document also notes two key checks that must be completed to validate a mask design: a design rule check to verify rules are followed, and circuit extraction to confirm masks produce the correct interconnected circuit.
The document discusses 3D transistors, which employ a single gate stacked on top of two vertical gates to allow three times the surface area for electron flow without increasing gate size. This overcomes issues with further scaling planar transistors. 3D transistors provide fully depleted operation and tighter channel control through conducting channels on three sides of a vertical fin. This enables high drive currents and improved switching performance. 3D transistors can operate at lower voltages than planar transistors, reducing power consumption by over 50% while maintaining or improving performance. They will allow continued transistor scaling per Moore's Law and are needed for future generations of chips.
Report contains digital and analog design flow procedures in detail, working, Simulation and Synthesize mapped output. Full custom Schematic and layout design by using virtuoso encounter cadence tool.
lash devices introduced a sudden shift in the performance profile of direct attached storage. With IOPS rates orders of magnitude higher than rotating storage, it became clear that Linux needed a re-design of its storage stack to properly support and get the most out of these new devices. This talk will detail the architecture of blk-mq, the redesign of the core of the Linux storage stack, and the later set of changes made to adapt the SCSI stack to this new queuing model. Early results of running Facebook infrastructure production workloads on top of the new stack will also be shared. Jense Axboe, Facebook
Digital Forensics and Incident Response (DFIR) for IT systems has been around quite a while, but what about Industrial Control Systems (ICS)? This talk will explore the basics of DFIR for embedded devices used in critical infrastructure such as Programmable Logic Controllers (PLCs), Remote Terminal Units (RTUs), and controllers. If these are compromised or even have a misoperation, we will show what files, firmware, memory dumps, physical conditions, and other data can be analyzed in embedded systems to determine the root cause. This talk will show examples of what and how to collect forensics data from two popular RTUs that are used in Electric Substations: the General Electric D20MX and the Schweitzer Engineering Labs SEL-3530 RTAC. This talk will not cover Windows or *nixbased devices such as Human Machine Interfaces (HMIs) or gateways.
The talk will discuss using D for a large scale distributed project implementing a primary storage system with strict performance and resources requirements. Will go over the pros and cons of using D and compare our experience to previous similar projects implemented in C and C++ with Python. We are an experienced group of system programmers, implementing a large software based storage system. We have leveraged D specific features to make sure we have a very sound infrastructure to use, some of the things we did were previously either impossible or impractical using only C or C++ forcing us to use Python for code generation, on the other hand—some D aspects make it more difficult to handle than the other options. I will present what we really like, and what we’ve learned to live with.
This document summarizes a meeting about accelerating SQLite with OpenCL on ARM SoCs. It discusses porting the Clover OpenCL implementation ("Shamrock") to ARM and newer LLVM versions to enable CPU-only OpenCL. Initial SQLite performance testing on ARM showed potential gains from parallelizing operations. The group's goals are to run SQLite's operations on the GPU using OpenCL kernels to achieve significant performance improvements over the CPU. Their current status and next steps involve porting the Khronos OpenCL conformance tests to ARM and updating Shamrock and SQLite to newer OpenCL specifications.
The document outlines an agenda for a training on practical firmware reversing and exploit development for AVR-based embedded devices. The training will cover: (1) an introduction to the AVR architecture through an example; (2) pre-exploitation techniques; (3) exploitation and building return-oriented programming (ROP) chains; and (4) post-exploitation tricks. It provides background information on the AVR architecture, which is used widely in embedded and IoT devices, and discusses features like memory organization, registers, interrupts, and assembly instructions. Development tools for AVR like Atmel Studio, AVR-GCC, and debuggers are also briefly mentioned.