[232] TensorRT를 활용한 딥러닝 Inference 최적화
This document provides an outline on learning the Go programming language. It discusses Go's history as a language developed by Google in 2007. Key features include being statically typed with garbage collection and support for concurrency. The document outlines disadvantages like Go still being a young language. It provides guidance on setting up a Go environment and learning basics like types, variables, functions, control structures, object orientation, and concurrency using goroutines and channels.


PyTorch is an open source machine learning library that provides two main features: tensor computing with strong GPU acceleration and built-in support for deep neural networks through an autodiff tape-based system. It includes packages for optimization algorithms, neural networks, multiprocessing, utilities, and computer vision tasks. PyTorch uses an imperative programming style and defines computation graphs at runtime, compared to TensorFlow which uses both static and dynamic graphs.

Golang supports several execution modes that determine how code is built and linked. The main modes are: - exe: Default for main packages, builds everything into a single executable. - shared: Combines packages into a shared library for dynamic linking, reducing binary size. Currently only supported on Linux. - archive: Default for non-main packages, builds into a .a library file. - c-shared/c-archive: Builds packages into a single C shared library/archive file for calling from C/C++. - plugin: Builds packages into a shared library that can be dynamically loaded at runtime, similar to dlopen. Currently only supported on Linux.

도입 AI Chatbot 소개 Chatbot Ecosystem Closed vs Open Domain Rule Based vs AI Chat IF Flow and Story Slot AI기반의 학습을 위한 Data 구성 방법 Data를 구하는 법 / Train을 위한 Word Representation Data의 구성 / Data Augmentation(Intent, NER) 자연어처리 위한 AI 적용 방안 Intent (Char-CNN) / QnA (Seq2Seq) Named Entity Recognition (Bi-LSTM CRF) / Ontology (Graph DB) Chatbot Service를 위한 Architecture 구성 Chatbot Architecture NLP Architecture Web Service Architecture Bot builder / Chatbot API Test Codes for Chatbot 실무에서 발생하는 문제와 해결 Tips Ensemble and voting / Trigger / Synonym(N-Gram) Tone Generator / Parallel processing / Response Speed 마무리 [설명 코드] Text Augmentation / Slot Bot / QA Bot / Graph DB / Response Generator

The Agenda for the Webinar: 1. Introduction to Python. 2. Python and Big Data. 3. Python and Data Science. 4. Key features of Python and their usage in Business Analytics. 5. Business Analytics with Python – Real world Use Cases.

Sanjay Rathore presents an introduction to the Django web framework. He discusses key features of Django including rapid development, security, and scalability. He outlines the MVT (Model View Template) architecture, describing the roles of each component. He also demonstrates how to install Django, set up a virtual environment, and build a basic MVT application with URL routing and templates. Pros of Django include its Python-based code, database management, and security, while cons are its potential heaviness for small projects.

[232] TensorRT를 활용한 딥러닝 Inference 최적화
Step 1: TF모델을 TRT 포맷으로 변환
Step 2: 모델 Parser 생성
Step 3: 입/출력 레이어 정보 입력
Step 4: 모델의 최적화 및
런타임 Engine 생성
Step 5: 엔진을 파일로 저장
Step 6: 엔진을 파일에서 읽음
Step 7: Inference 수행
Build a full-functioned virtual machine from scratch, when Brainfuck is used. Basic concepts about interpreter, optimizations techniques, language specialization, and platform specific tweaks.

Ready for a deep dive into the world's most challenging programming paradigm? Reactive programming can simplify asynchronous and event-driven applications, but without a strong understanding, it can lead to frustration, recurring patchwork, missed deadlines, and costly bugs. In this intensive three-hour session, we'll transition a traditional Spring application to WebFlux, revealing patterns and aanti-patterns when working with repositories, REST APIs, queues, and legacy libraries. You'll gain a clear understanding of often overlooked but critical aspects like subscribe signal, errors, cancellation, and signal loss. As a bonus, we'll debate the future of Reactive vs Virtual Threads, production-ready in Java 21. This session is crucial for developers already working with reactive programming or those intending to make the leap.

PReLUPlugin::PReLUPlugin(const Weights *weights, int nbWeights) {
mWeights = weights[0];
mWeights.values = malloc(mWeights.count * type2size(mWeights.type));
memcpy(const_cast<void *>(mWeights.values), weights[0].values, mWeights.count * type2size(mWeights.type));

넥슨코리아 사내 발표자료로 왓 스튜디오에서 파이썬으로 《야생의 땅: 듀랑고》 서버를 비롯한 여러가지 도구를 만든 경험을 공유합니다. - 게임서버와 각종 툴, 테스트/빌드/배포 시스템을 만들 때 사용한 재료 - 파이썬 코드 품질 개선, 디버깅, 프로파일링, 최적화 - 파이썬 오픈소스 생태계와 왓 스튜디오가 하는 오픈소스 활동

This document contains information about a mentoring program from Baabtra-Mentoring Partner. It includes a disclaimer, tables tracking a mentee's typing speed and job applications over 4 weeks, an introduction to multiprocessing in Python with examples of processes, queues, and locks, contact information for Baabtra, and a request to like their Facebook page.

Thrift and PasteScript are frameworks for building distributed applications and services. Thrift allows defining data types and interfaces using a simple definition language that can generate code in multiple languages. It uses a compact binary protocol for efficient RPC-style communication between clients and servers. PasteScript builds on WSGI and provides tools like paster for deploying and managing Python web applications, along with reloading and logging capabilities. It integrates with Thrift via server runners and application factories.

int PReLUPlugin::enqueue(int batchSize, const void *const *inputs, void **outputs, void *workspace,
cudaStream_t stream) {
const float zerof{0.0f}; const __half zeroh = fp16::__float2half(0.0f);
if (mWeights.type == DataType::__float) {
CHECK(Forward_gpu<__float>(batchSize * mNbInputCount, mNbInputChannels,
mNbInputHeight * mNbInputHeight, reinterpret_cast<const __float *>(mDeviceKernel),
reinterpret_cast<const __float *>(inputs[0]), reinterpret_cast<__float *>(outputs[0]),
zerof, mChannelShared ? mNbInputChannels : 1, stream));
} else { // DataType::kFLOAT }
return 0;
template <typename Ftype>
__global__ void PReLUForward(const int n, const int channels, const int dim, const Ftype* slope_data, const
Ftype* in, Ftype* out, const Ftype zero, const int div_factor) {
CUDA_KERNEL_LOOP(index, n) {
int c = (index / dim) % channels / div_factor;
out[index] = (in[index] > (Ftype(zero))) ? in[index] :
in[index] * *(reinterpret_cast<const Ftype*>(slope_data)+c);
template <typename Ftype>
cudaError_t Forward_gpu(const int count, const int channels, const int dim, const Ftype* mDeviceKernel,
const Ftype* bottom_data, Ftype* top_data, const Ftype zero, const int div_factor, const cudaStream_t stream) {
PReLUForward<<<CAFFE_GET_BLOCKS(count), CAFFE_CUDA_NUM_THREADS, 0, stream>>>
(count, channels, dim, mDeviceKernel, bottom_data, top_data, zero, div_factor);
return cudaGetLastError();
The document discusses protocol handlers in Gecko. It explains that protocol handlers allow Gecko to interact with different URI schemes like http, ftp, file etc. It provides an overview of how the awesome bar, browser UI, DocShell and Necko components work together to handle protocol requests from inputting a URL in the awesome bar to creating a channel and loading content. It also briefly introduces channels and stream listeners in Necko which are used for asynchronous loading of content.

Kapacitor is a native data processing engine.Kapacitor is a native data processing engine.It can process both stream and batch data from InfluxDB.It lets you plug in your own custom logic or user-defined functions to process alerts with dynamic thresholds. Key Kapacitor Capabilities -Alerting -ETL (Extraction, Transformation and Loading) -Action Oriented -Streaming Analytics -Anomaly Detection Kapacitor uses a DSL (Domain Specific Language) called TICKscript to define tasks.

IPluginExt *PReLUPlugin::clone() const override {
return new PReLUPlugin(&mWeights, 1);
IPlugin* pluginFactory::createPlugin(const char* layerName, const Weights* serialData, int nbWeights) override {
return new PReLUPlugin(serialData, serialLength);
PluginFactory parserPluginFactory;
const IBlobNameToTensor *blobNameToTensor =
parser->parse(gParams.deployFile.c_str(), // caffe deploy file
gParams.modelFile.c_str(), // caffe model file
*network, // network definition that the parser will populate
gParams.fp16 ? DataType::kHALF : DataType::kFLOAT);
builder->setMaxWorkspaceSize(size_t(gParams.workspaceSize) << 20);
ICudaEngine* engine = builder->buildCudaEngine(*network);
void PReLUPlugin::serialize(void *buffer) {
char *d = static_cast<char *>(buffer), *a = d;
write(d, mNbInputChannels); write(d, mNbInputHeight); write(d, mNbInputWidth); write(d, mNbInputCount);
write(d, mChannelShared); write(d, mWeights.count); write(d, mWeights.type);
convertAndCopyToBuffer(d, mWeights);
This document summarizes the key changes and new features in PHP 5.6, which was released in August 2014. It provides

PReLUPlugin::PReLUPlugin(const void *data, size_t length) {
const char *d = static_cast<const char *>(data), *a = d;
read<int>(d, mNbInputChannels); read<int>(d, mNbInputHeight); read<int>(d, mNbInputWidth);
read<int>(d, mNbInputCount); read<bool>(d, mChannelShared); read<int64_t>(d, mWeights.count);
read<DataType>(d, mWeights.type);
mWeights.values = malloc(mWeights.count * type2size(mWeights.type));
memcpy(const_cast<void *>(mWeights.values), d, mWeights.count * type2size(mWeights.type));
deserializeToDevice(d, mDeviceKernel, mWeights.count * type2size(mWeights.type));
assert(d == a + length);
Iplugin *PluginFactory::createPlugin(const char *layerName, const void *serialData, size_t serialLength) override
return new PReLUPlugin(serialData, serialLength);
PluginFactory pluginFactory;
engine = infer->deserializeCudaEngine(trt_plan_file, size, &pluginFactory);
cudaMemcpyAsync(buffers[inputIndex], input, batchSize * INPUT_SIZE * sizeof(float),
cudaMemcpyHostToDevice, stream);
context->enqueue(gParams.batchSize, &buffers[0], stream, nullptr);
cudaMemcpyAsync(output, buffers[outputIndex], batchSize * OUTPUT_SIZE * sizeof(float),
cudaMemcpyDeviceToHost, stream);
IExecutionContext* context = engine->createExecutionContext();

  • 14. Step 1: TF모델을 TRT 포맷으로 변환 Step 2: 모델 Parser 생성 Step 3: 입/출력 레이어 정보 입력 Step 4: 모델의 최적화 및 런타임 Engine 생성 Step 5: 엔진을 파일로 저장 Step 6: 엔진을 파일에서 읽음 Step 7: Inference 수행
  • 24. • • • PReLUPlugin::PReLUPlugin(const Weights *weights, int nbWeights) { mWeights = weights[0]; mWeights.values = malloc(mWeights.count * type2size(mWeights.type)); memcpy(const_cast<void *>(mWeights.values), weights[0].values, mWeights.count * type2size(mWeights.type)); }
  • 25. int PReLUPlugin::enqueue(int batchSize, const void *const *inputs, void **outputs, void *workspace, cudaStream_t stream) { const float zerof{0.0f}; const __half zeroh = fp16::__float2half(0.0f); if (mWeights.type == DataType::__float) { CHECK(Forward_gpu<__float>(batchSize * mNbInputCount, mNbInputChannels, mNbInputHeight * mNbInputHeight, reinterpret_cast<const __float *>(mDeviceKernel), reinterpret_cast<const __float *>(inputs[0]), reinterpret_cast<__float *>(outputs[0]), zerof, mChannelShared ? mNbInputChannels : 1, stream)); } else { // DataType::kFLOAT } return 0; }
  • 26. template <typename Ftype> __global__ void PReLUForward(const int n, const int channels, const int dim, const Ftype* slope_data, const Ftype* in, Ftype* out, const Ftype zero, const int div_factor) { CUDA_KERNEL_LOOP(index, n) { int c = (index / dim) % channels / div_factor; out[index] = (in[index] > (Ftype(zero))) ? in[index] : in[index] * *(reinterpret_cast<const Ftype*>(slope_data)+c); } }
  • 27. template <typename Ftype> cudaError_t Forward_gpu(const int count, const int channels, const int dim, const Ftype* mDeviceKernel, const Ftype* bottom_data, Ftype* top_data, const Ftype zero, const int div_factor, const cudaStream_t stream) { PReLUForward<<<CAFFE_GET_BLOCKS(count), CAFFE_CUDA_NUM_THREADS, 0, stream>>> (count, channels, dim, mDeviceKernel, bottom_data, top_data, zero, div_factor); return cudaGetLastError(); }
  • 29. IPluginExt *PReLUPlugin::clone() const override { return new PReLUPlugin(&mWeights, 1); } IPlugin* pluginFactory::createPlugin(const char* layerName, const Weights* serialData, int nbWeights) override { return new PReLUPlugin(serialData, serialLength); }
  • 30. PluginFactory parserPluginFactory; parser->setPluginFactoryExt(&parserPluginFactory); const IBlobNameToTensor *blobNameToTensor = parser->parse(gParams.deployFile.c_str(), // caffe deploy file gParams.modelFile.c_str(), // caffe model file *network, // network definition that the parser will populate gParams.fp16 ? DataType::kHALF : DataType::kFLOAT);
  • 32. void PReLUPlugin::serialize(void *buffer) { char *d = static_cast<char *>(buffer), *a = d; write(d, mNbInputChannels); write(d, mNbInputHeight); write(d, mNbInputWidth); write(d, mNbInputCount); write(d, mChannelShared); write(d, mWeights.count); write(d, mWeights.type); convertAndCopyToBuffer(d, mWeights); assert(d == a + getSerializationSize()); }
  • 33. PReLUPlugin::PReLUPlugin(const void *data, size_t length) { const char *d = static_cast<const char *>(data), *a = d; read<int>(d, mNbInputChannels); read<int>(d, mNbInputHeight); read<int>(d, mNbInputWidth); read<int>(d, mNbInputCount); read<bool>(d, mChannelShared); read<int64_t>(d, mWeights.count); read<DataType>(d, mWeights.type); mWeights.values = malloc(mWeights.count * type2size(mWeights.type)); memcpy(const_cast<void *>(mWeights.values), d, mWeights.count * type2size(mWeights.type)); deserializeToDevice(d, mDeviceKernel, mWeights.count * type2size(mWeights.type)); assert(d == a + length); }
  • 34. Iplugin *PluginFactory::createPlugin(const char *layerName, const void *serialData, size_t serialLength) override { return new PReLUPlugin(serialData, serialLength); }
  • 35. PluginFactory pluginFactory; engine = infer->deserializeCudaEngine(trt_plan_file, size, &pluginFactory);
  • 36. cudaMemcpyAsync(buffers[inputIndex], input, batchSize * INPUT_SIZE * sizeof(float), cudaMemcpyHostToDevice, stream); context->enqueue(gParams.batchSize, &buffers[0], stream, nullptr); cudaMemcpyAsync(output, buffers[outputIndex], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream); cudaStreamSynchronize(stream); cudaStreamCreate(&stream)); IExecutionContext* context = engine->createExecutionContext();