SlideShare a Scribd company logo
"Why is there no artificial intelligence
yet?" Or, analysis of CNTK tool kit from
Microsoft Research.
Author: Svyatoslav Razmyslov
Date: 02.02.2016
Microsoft have given open access to the source code of a tool kit that is used in the company to speed
up the development of artificial intelligence: Computational Network Toolkit is now available at Github.
The developers had to create their own custom solution, because the existing tools did not work fast
enough.
Let's have a look at the analysis results of the source code of this project, as done by our static code
analyzer.
Introduction
Computational Network Toolkit (CNTK) is a set of tools for designing and projecting nets of different
types, that can be used for image detection, speech recognition, text analysis, and much more.
PVS-Studio is a static analyzer for bug detection in the source code of programs, written in C, C++ and
C#. The PVS-Studio tool is made for developers of contemporary applications, and integrates into the
Visual Studio environments of 2010-2015.
Preparing an article on an open source project check, we can only report , of course, on a limited
number of all warnings issued by the analyzer, therefore we recommend the authors of the project run
the analyzer on their code themselves, and study the complete analysis results. We also provide
developers of open source projects with a temporary key.
I should say right away that there weren't many bugs found, which was as expected. Having checked
several Microsoft projects, we can say that their code really is of very high quality. But we shouldn't
forget that the benefit of a static code analyzer is in its regular use, not random checks.
Those misprints...
Typos are a very unpleasant thing. They've penetrated social networks, books, online publications, and
even TV. In simple texts they can be eliminated by means of the spell check functions in text editors; in
programming it can be done with the help of static code analyzers.
V501 There are identical sub-expressions '!Input(0)->HasMBLayout()' to the left and to the right of the
'||' operator. trainingnodes.h 1416
virtual void Validate(bool isFinalValidationPass) override
{
....
if (isFinalValidationPass &&
!(Input(0)->GetSampleMatrixNumRows() ==
Input(2)->GetSampleMatrixNumRows() &&
(Input(0)->GetMBLayout() ==
Input(2)->GetMBLayout() ||
!Input(0)->HasMBLayout() || // <=
!Input(0)->HasMBLayout()))) // <=
{
LogicError(..., NodeName().c_str(),OperationName().c_str());
}
....
}
Formatting of this fragment is modified for clarity. Only after that it became apparent that there are two
similar "! Input (0)-> HasMBLayout () "checks in the condition. Most likely, it is impossible to use an
element with index '2' in one of the cases.
V501 There are identical sub-expressions to the left and to the right of the '-' operator: i0 - i0
ssematrix.h 564
void assignpatch(const ssematrixbase &patch,
const size_t i0,
const size_t i1,
const size_t j0,
const size_t j1)
{
....
for (size_t j = j0; j < j1; j++)
{
const float *pcol = &patch(i0 - i0, j - j0); // <=
float *qcol = &us(i0, j);
const size_t colbytes = (i1 - i0) * sizeof(*pcol);
memcpy(qcol, pcol, colbytes);
}
....
}
Because of the misprint, the condition "i0-i0" is always equal to zero. Perhaps "i1-i0" or "j-i1" or
something else was meant here. Developers should definitely recheck this place.
V596 The object was created but it is not being used. The 'throw' keyword could be missing: throw
runtime_error(FOO); simplenetworkbuilder.cpp 1578
template <class ElemType>
ComputationNetworkPtr SimpleNetworkBuilder<ElemType>::
BuildNetworkFromDbnFile(const std::wstring& dbnModelFileName)
{
....
if (this->m_outputLayerSize >= 0)
outputLayerSize = this->m_outputLayerSize;
else if (m_layerSizes.size() > 0)
m_layerSizes[m_layerSizes.size() - 1];
else
std::runtime_error("Output layer size must be..."); // <=
....
}
The error, is that the keyword 'throw' was accidentally forgotten. As a result, this code does not
generate an exception in case of an error. The correct code variant should be:
....
else
throw std::runtime_error("Output layer size must be...");
....
Working with files
V739 EOF should not be compared with a value of the 'char' type. The 'c' should be of the 'int' type.
fileutil.cpp 852
string fgetstring(FILE* f)
{
string res;
for (;;)
{
char c = (char) fgetc(f); // <=
if (c == EOF) // <=
RuntimeError("error reading .... 0: %s", strerror(errno));
if (c == 0)
break;
res.push_back(c);
}
return res;
}
The analyzer detected that the EOF constant is compared with a variable of 'char' type. This shows that
some symbols will be processed incorrectly.
Let's look at the way EOF is declared:
#define EOF (-1)
As you can see, the EOF is nothing more than '-1 ' of 'int' type. Fgetc() function returns a value of 'int'
type. Specifically, it can return a number from 0 to 255 or -1 (EOF). The values read are placed into a
variable of 'char' type. Because of this, a symbol with the 0xFF (255) value turns into -1, and then
handled in the same way as the file end (EOF).
Users that use Extended ASCII Codes, may encounter an error when one of the symbols of their
alphabet is handled incorrectly by the program.
For example in the Windows 1251 code page, the last letter of Russian alphabet has the 0xFF code, and
so is interpreted by the program as the end-of-file character.
Correct code fragment:
int c = fgetc(f);
if (c == EOF)
RuntimeError(....);
V547 Expression 'val[0] == 0xEF' is always false. The value range of char type: [-128, 127]. file.cpp 462
bool File::IsUnicodeBOM(bool skip)
{
....
else if (m_options & fileOptionsText)
{
char val[3];
file.ReadString(val, 3);
found = (val[0] == 0xEF && val[1] == 0xBB && val[2] == 0xBF);
}
// restore pointer if no BOM or we aren't skipping it
if (!found || !skip)
{
SetPosition(pos);
}
....
}
By default the 'char' type has a range of values equal to [-127;127]. Using the compilation flag /J, we can
specify to the compiler to use the range [0; 255]. But there is no such flag for this source file, and so this
code will never determine that this file contains BOM.
Working with memory
V595 The 'm_rowIndices' pointer was utilized before it was verified against nullptr. Check lines: 171,
175. libsvmbinaryreader.cpp 171
template <class ElemType>
void SparseBinaryMatrix<ElemType>::ResizeArrays(size_t newNNz)
{
....
if (m_nnz > 0)
{
memcpy(rowIndices, m_rowIndices, sizeof(int32_t)....); // <=
memcpy(values, this->m_values, sizeof(ElemType)....); // <=
}
if (m_rowIndices != nullptr)
{
// free(m_rowIndices);
CUDAPageLockedMemAllocator::Free(this->m_rowIndices, ....);
}
if (this->m_values != nullptr)
{
// free(this->m_values);
CUDAPageLockedMemAllocator::Free(this->m_values, ....);
}
....
}
The analyzer detected integer dereference of a null pointer.
If there is a comparison with null in the code, when at an earlier time this pointer was used without a
check, then this code is suspicious, and so, could be dangerous.
The memcpy() function copies the bytes located at "m_rowIndices" and "m_values", at the same time
there is dereference of this pointer and in the given code it can potentially be equal to zero.
V510 The 'sprintf_s' function is not expected to receive class-type variable as third actual argument.
binaryfile.cpp 501
const std::wstring& GetName()
{
return m_name;
}
Section* Section::ReadSection(....)
{
....
char message[256];
sprintf_s(message,"Invalid header in file %ls, in header %sn",
m_file->GetName(), section->GetName()); // <=
RuntimeError(message);
....
}
Only POD types can serve as actual parameters of sprint_s() function. POD is an abbreviation of "Plain
Old Data", which can be interpreted as "Simple data in C style".
"std::wstring" doesn't belong to POD types. Instead of a pointer, the content of the object will go to the
stack. This code will lead to some rubbish in the buffer or a program crash.
Correct variant:
sprintf_s(message,"Invalid header in file %ls, in header %sn",
m_file->GetName().c_str(), section->GetName().c_str());
V630 The 'malloc' function is used to allocate memory for an array of objects which are classes
containing constructors. latticeforwardbackward.cpp 912
void lattice::forwardbackwardalign()
{
....
aligninfo *refinfo;
unsigned short *refalign;
refinfo = (aligninfo *) malloc(sizeof(aligninfo) * 1); // <=
refalign = (unsigned short *) malloc(sizeof(....) * framenum);
array_ref<aligninfo> refunits(refinfo, 1);
array_ref<unsigned short> refedgealignmentsj(....);
....
}
In this code fragment, the analyzer detected an incorrect allocation of dynamic memory for a structure
of "aligninfo" type. The thing here, is that there are constructors in the structure definition, but the
constructor won't be called with this method of memory allocation. Also the destructor won't be called
during the freeing of the memory with the help of free() function.
Here you can see a code fragment with the description of "aligninfo" type.
struct aligninfo // phonetic alignment
{
unsigned int unit : 19; // triphone index
unsigned int frames : 11; // duration in frames
unsigned int unused : 1; // (for future use)
unsigned int last : 1; // set for last entry
aligninfo(size_t punit, size_t pframes)
: unit((unsigned int) punit),
frames((unsigned int) pframes), unused(0), last(0)
{
checkoverflow(unit, punit, "aligninfo::unit");
checkoverflow(frames, pframes, "aligninfo::frames");
}
aligninfo() // [v-hansu] initialize to impossible values
{
#ifdef INITIAL_STRANGE
unit = unsigned int(-1);
frames = unsigned int(-1);
unused = unsigned int(-1);
last = unsigned int(-1);
#endif
}
template <class IDMAP>
void updateunit(const IDMAP& idmap /*[unit] -> new unit*/)
{
const size_t mappedunit = idmap[unit];
unit = (unsigned int) mappedunit;
checkoverflow(unit, mappedunit, "aligninfo::unit");
}
};
Correct variant:
aligninfo *refinfo = new aligninfo();
And of course, you will need to call the 'delete' operator to free the memory.
V599 The virtual destructor is not present, although the 'IDataWriter' class contains virtual functions.
datawriter.cpp 47
IDataWriter<ElemType>* m_dataWriter;
....
template <class ElemType>
void DataWriter<ElemType>::Destroy()
{
delete m_dataWriter; // <= V599 warning
m_dataWriter = NULL;
}
The analyzer warning shows that a base type of the object to be destroyed has no virtual destructor. In
this case the destruction of the object of the derived class will cause undefined behavior of the program.
In practice this can lead to memory leaks, and to a situation wherein other resources will not be
released. Let's try to understand what caused this warning to appear.
template <class ElemType>
class DATAWRITER_API IDataWriter
{
public:
typedef std::string LabelType;
typedef unsigned int LabelIdType;
virtual void Init(....) = 0;
virtual void Init(....) = 0;
virtual void Destroy() = 0;
virtual void GetSections(....) = 0;
virtual bool SaveData(....) = 0;
virtual void SaveMapping(....) = 0;
};
This is a definition of a base class, as we can see, it has virtual functions, but a virtual destructor is
missing.
m_dataWriter = new HTKMLFWriter<ElemType>();
Thus the memory is allocated for the object of the derived class "HTKMLFWriter". It's description:
template <class ElemType>
class HTKMLFWriter : public IDataWriter<ElemType>
{
private:
std::vector<size_t> outputDims;
std::vector<std::vector<std::wstring>> outputFiles;
std::vector<size_t> udims;
std::map<std::wstring, size_t> outputNameToIdMap;
std::map<std::wstring, size_t> outputNameToDimMap;
std::map<std::wstring, size_t> outputNameToTypeMap;
unsigned int sampPeriod;
size_t outputFileIndex;
void Save(std::wstring& outputFile, ....);
ElemType* m_tempArray;
size_t m_tempArraySize;
....
};
Due to the missing virtual destructor in the base class, this object will not be properly destroyed. For
outputDims, outputFiles objects the destructors will also not be called. However, in general it is
impossible to predict all the aftereffects, this is why we use the term "undefined behavior".
Miscellaneous errors.
V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a
lower priority than the '|' operator. sequenceparser.h 338
enum SequenceFlags
{
seqFlagNull = 0,
seqFlagLineBreak = 1, // line break on the parsed line
seqFlagEmptyLine = 2, // empty line
seqFlagStartLabel = 4,
seqFlagStopLabel = 8
};
long Parse(....)
{
....
// sequence state machine variables
bool m_beginSequence;
bool m_endSequence;
....
if (seqPos)
{
SequencePosition sequencePos(numbers->size(), labels->size(),
m_beginSequence ? seqFlagStartLabel : 0 | m_endSequence ?
seqFlagStopLabel : 0 | seqFlagLineBreak);
// add a sequence element to the list
seqPos->push_back(sequencePos);
sequencePositionLast = sequencePos;
}
// end of sequence determines record separation
if (m_endSequence)
recordCount = (long) labels->size();
....
}
The priority of a ternary operator ':?' is lower than of a bitwise OR '|' operator. Lets' have a closer look
at the fragment containing an error:
0 | m_endSequence ? seqFlagStopLabel : 0 | seqFlagLineBreak
It is expected that only bitwise operations with specified flags will be executed, however, because of an
unexpected execution order, "0 | m_endSequence" will be executed first, instead of "m_endSequence ?
seqFlagStopLabel : 0 | seqFlagLineBreak".
In fact, this is an interesting case. In spite of the error, the code works correctly. Bitwise OR with 0
doesn't affect anything.
Nevertheless, it is better to correct the error.
There are two more similar fragments:
 V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator
has a lower priority than the '|' operator. sequenceparser.h 433
 V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator
has a lower priority than the '|' operator. sequenceparser.h 598
V530 The return value of function 'size' is required to be utilized. basics.h 428
// TODO: merge this with todouble(const char*) above
static inline double todouble(const std::string& s)
{
s.size(); // just used to remove the unreferenced warning
double value = 0.0;
....
}
There is no error here, which we can see in the comment, but this example is given here for two
reasons:
Firstly, to disable a compiler warning, there is an UNREFERENCED_PARAMETER macro, whose name
makes it clear that the parameter of the function is not used deliberately:
#define UNREFERENCED_PARAMETER(P) (P)
static inline double todouble(const std::string& s)
{
UNREFERENCED_PARAMETER(s);
....
}
Secondly, we would like to show another compiler warning that most likely indicates an error.
V530 The return value of function 'empty' is required to be utilized. utterancesourcemulti.h 340
template <class UTTREF>
std::vector<shiftedvector<....>>getclassids(const UTTREF &uttref)
{
std::vector<shiftedvector<....>> allclassids;
allclassids.empty(); // <=
....
}
There is no point in non-use of the result of the empty() function.
Perhaps the vector was to be cleared with the clear() function.
A similar fragment:
 V530 The return value of function 'empty' is required to be utilized. utterancesourcemulti.h 364
V688 The 'm_file' local variable possesses the same name as one of the class members, which can result
in confusion. sequencereader.cpp 552
template <class ElemType>
class SequenceReader : public IDataReader<ElemType>
{
protected:
bool m_idx2clsRead;
bool m_clsinfoRead;
bool m_idx2probRead;
std::wstring m_file; // <=
....
}
template <class ElemType>
template <class ConfigRecordType>
void SequenceReader<ElemType>::InitFromConfig(....)
{
....
std::wstring m_file = readerConfig(L"file"); // <=
if (m_traceLevel > 0)
{
fprintf(stderr, "....", m_file.c_str());
}
....
}
Using variables of the same name in the class, class functions, and class parameters, is a very bad
programming style. For instance: was the variable declaration "std::wstring m_file =
readerConfig(L"file");" supposed to be here, or was it added temporarily for debugging and then, was
left forgotten?
Developers should also review the following fragments:
 V688 The 'm_file' local variable possesses the same name as one of the class members, which
can result in a confusion. sequencereader.cpp 1554
 V688 The 'm_mbStartSample' function argument possesses the same name as one of the class
members, which can result in a confusion. sequencereader.cpp 2062
 V688 The 'm_file' local variable possesses the same name as one of the class members, which
can result in a confusion. lusequencereader.cpp 417
Conclusion:
Computational Network Toolkit (CNTK), being a relatively small project, turned out to be a pretty
interesting piece of software. As the CNTK project has just recently been made open, we are looking
forward to seeing new ideas for its usage, and of course, other open source projects by Microsoft.

More Related Content

"Why is there no artificial intelligence yet?" Or, analysis of CNTK tool kit from Microsoft Research.

  • 1. "Why is there no artificial intelligence yet?" Or, analysis of CNTK tool kit from Microsoft Research. Author: Svyatoslav Razmyslov Date: 02.02.2016 Microsoft have given open access to the source code of a tool kit that is used in the company to speed up the development of artificial intelligence: Computational Network Toolkit is now available at Github. The developers had to create their own custom solution, because the existing tools did not work fast enough. Let's have a look at the analysis results of the source code of this project, as done by our static code analyzer. Introduction Computational Network Toolkit (CNTK) is a set of tools for designing and projecting nets of different types, that can be used for image detection, speech recognition, text analysis, and much more. PVS-Studio is a static analyzer for bug detection in the source code of programs, written in C, C++ and C#. The PVS-Studio tool is made for developers of contemporary applications, and integrates into the Visual Studio environments of 2010-2015. Preparing an article on an open source project check, we can only report , of course, on a limited number of all warnings issued by the analyzer, therefore we recommend the authors of the project run the analyzer on their code themselves, and study the complete analysis results. We also provide developers of open source projects with a temporary key. I should say right away that there weren't many bugs found, which was as expected. Having checked several Microsoft projects, we can say that their code really is of very high quality. But we shouldn't forget that the benefit of a static code analyzer is in its regular use, not random checks.
  • 2. Those misprints... Typos are a very unpleasant thing. They've penetrated social networks, books, online publications, and even TV. In simple texts they can be eliminated by means of the spell check functions in text editors; in programming it can be done with the help of static code analyzers. V501 There are identical sub-expressions '!Input(0)->HasMBLayout()' to the left and to the right of the '||' operator. trainingnodes.h 1416 virtual void Validate(bool isFinalValidationPass) override { .... if (isFinalValidationPass && !(Input(0)->GetSampleMatrixNumRows() == Input(2)->GetSampleMatrixNumRows() && (Input(0)->GetMBLayout() == Input(2)->GetMBLayout() || !Input(0)->HasMBLayout() || // <= !Input(0)->HasMBLayout()))) // <= { LogicError(..., NodeName().c_str(),OperationName().c_str()); } .... } Formatting of this fragment is modified for clarity. Only after that it became apparent that there are two similar "! Input (0)-> HasMBLayout () "checks in the condition. Most likely, it is impossible to use an element with index '2' in one of the cases. V501 There are identical sub-expressions to the left and to the right of the '-' operator: i0 - i0 ssematrix.h 564 void assignpatch(const ssematrixbase &patch, const size_t i0, const size_t i1, const size_t j0, const size_t j1) { .... for (size_t j = j0; j < j1; j++)
  • 3. { const float *pcol = &patch(i0 - i0, j - j0); // <= float *qcol = &us(i0, j); const size_t colbytes = (i1 - i0) * sizeof(*pcol); memcpy(qcol, pcol, colbytes); } .... } Because of the misprint, the condition "i0-i0" is always equal to zero. Perhaps "i1-i0" or "j-i1" or something else was meant here. Developers should definitely recheck this place. V596 The object was created but it is not being used. The 'throw' keyword could be missing: throw runtime_error(FOO); simplenetworkbuilder.cpp 1578 template <class ElemType> ComputationNetworkPtr SimpleNetworkBuilder<ElemType>:: BuildNetworkFromDbnFile(const std::wstring& dbnModelFileName) { .... if (this->m_outputLayerSize >= 0) outputLayerSize = this->m_outputLayerSize; else if (m_layerSizes.size() > 0) m_layerSizes[m_layerSizes.size() - 1]; else std::runtime_error("Output layer size must be..."); // <= .... } The error, is that the keyword 'throw' was accidentally forgotten. As a result, this code does not generate an exception in case of an error. The correct code variant should be: .... else throw std::runtime_error("Output layer size must be..."); ....
  • 4. Working with files V739 EOF should not be compared with a value of the 'char' type. The 'c' should be of the 'int' type. fileutil.cpp 852 string fgetstring(FILE* f) { string res; for (;;) { char c = (char) fgetc(f); // <= if (c == EOF) // <= RuntimeError("error reading .... 0: %s", strerror(errno)); if (c == 0) break; res.push_back(c); } return res; } The analyzer detected that the EOF constant is compared with a variable of 'char' type. This shows that some symbols will be processed incorrectly. Let's look at the way EOF is declared: #define EOF (-1) As you can see, the EOF is nothing more than '-1 ' of 'int' type. Fgetc() function returns a value of 'int' type. Specifically, it can return a number from 0 to 255 or -1 (EOF). The values read are placed into a variable of 'char' type. Because of this, a symbol with the 0xFF (255) value turns into -1, and then handled in the same way as the file end (EOF). Users that use Extended ASCII Codes, may encounter an error when one of the symbols of their alphabet is handled incorrectly by the program.
  • 5. For example in the Windows 1251 code page, the last letter of Russian alphabet has the 0xFF code, and so is interpreted by the program as the end-of-file character. Correct code fragment: int c = fgetc(f); if (c == EOF) RuntimeError(....); V547 Expression 'val[0] == 0xEF' is always false. The value range of char type: [-128, 127]. file.cpp 462 bool File::IsUnicodeBOM(bool skip) { .... else if (m_options & fileOptionsText) { char val[3]; file.ReadString(val, 3); found = (val[0] == 0xEF && val[1] == 0xBB && val[2] == 0xBF); } // restore pointer if no BOM or we aren't skipping it if (!found || !skip) { SetPosition(pos); } .... } By default the 'char' type has a range of values equal to [-127;127]. Using the compilation flag /J, we can specify to the compiler to use the range [0; 255]. But there is no such flag for this source file, and so this code will never determine that this file contains BOM. Working with memory V595 The 'm_rowIndices' pointer was utilized before it was verified against nullptr. Check lines: 171, 175. libsvmbinaryreader.cpp 171 template <class ElemType> void SparseBinaryMatrix<ElemType>::ResizeArrays(size_t newNNz) { .... if (m_nnz > 0)
  • 6. { memcpy(rowIndices, m_rowIndices, sizeof(int32_t)....); // <= memcpy(values, this->m_values, sizeof(ElemType)....); // <= } if (m_rowIndices != nullptr) { // free(m_rowIndices); CUDAPageLockedMemAllocator::Free(this->m_rowIndices, ....); } if (this->m_values != nullptr) { // free(this->m_values); CUDAPageLockedMemAllocator::Free(this->m_values, ....); } .... } The analyzer detected integer dereference of a null pointer. If there is a comparison with null in the code, when at an earlier time this pointer was used without a check, then this code is suspicious, and so, could be dangerous. The memcpy() function copies the bytes located at "m_rowIndices" and "m_values", at the same time there is dereference of this pointer and in the given code it can potentially be equal to zero. V510 The 'sprintf_s' function is not expected to receive class-type variable as third actual argument. binaryfile.cpp 501 const std::wstring& GetName() { return m_name; } Section* Section::ReadSection(....) { .... char message[256]; sprintf_s(message,"Invalid header in file %ls, in header %sn", m_file->GetName(), section->GetName()); // <= RuntimeError(message); .... } Only POD types can serve as actual parameters of sprint_s() function. POD is an abbreviation of "Plain Old Data", which can be interpreted as "Simple data in C style". "std::wstring" doesn't belong to POD types. Instead of a pointer, the content of the object will go to the stack. This code will lead to some rubbish in the buffer or a program crash. Correct variant: sprintf_s(message,"Invalid header in file %ls, in header %sn", m_file->GetName().c_str(), section->GetName().c_str()); V630 The 'malloc' function is used to allocate memory for an array of objects which are classes containing constructors. latticeforwardbackward.cpp 912
  • 7. void lattice::forwardbackwardalign() { .... aligninfo *refinfo; unsigned short *refalign; refinfo = (aligninfo *) malloc(sizeof(aligninfo) * 1); // <= refalign = (unsigned short *) malloc(sizeof(....) * framenum); array_ref<aligninfo> refunits(refinfo, 1); array_ref<unsigned short> refedgealignmentsj(....); .... } In this code fragment, the analyzer detected an incorrect allocation of dynamic memory for a structure of "aligninfo" type. The thing here, is that there are constructors in the structure definition, but the constructor won't be called with this method of memory allocation. Also the destructor won't be called during the freeing of the memory with the help of free() function. Here you can see a code fragment with the description of "aligninfo" type. struct aligninfo // phonetic alignment { unsigned int unit : 19; // triphone index unsigned int frames : 11; // duration in frames unsigned int unused : 1; // (for future use) unsigned int last : 1; // set for last entry aligninfo(size_t punit, size_t pframes) : unit((unsigned int) punit), frames((unsigned int) pframes), unused(0), last(0) { checkoverflow(unit, punit, "aligninfo::unit"); checkoverflow(frames, pframes, "aligninfo::frames"); } aligninfo() // [v-hansu] initialize to impossible values { #ifdef INITIAL_STRANGE unit = unsigned int(-1); frames = unsigned int(-1); unused = unsigned int(-1); last = unsigned int(-1); #endif } template <class IDMAP> void updateunit(const IDMAP& idmap /*[unit] -> new unit*/) { const size_t mappedunit = idmap[unit]; unit = (unsigned int) mappedunit; checkoverflow(unit, mappedunit, "aligninfo::unit"); } }; Correct variant: aligninfo *refinfo = new aligninfo();
  • 8. And of course, you will need to call the 'delete' operator to free the memory. V599 The virtual destructor is not present, although the 'IDataWriter' class contains virtual functions. datawriter.cpp 47 IDataWriter<ElemType>* m_dataWriter; .... template <class ElemType> void DataWriter<ElemType>::Destroy() { delete m_dataWriter; // <= V599 warning m_dataWriter = NULL; } The analyzer warning shows that a base type of the object to be destroyed has no virtual destructor. In this case the destruction of the object of the derived class will cause undefined behavior of the program. In practice this can lead to memory leaks, and to a situation wherein other resources will not be released. Let's try to understand what caused this warning to appear. template <class ElemType> class DATAWRITER_API IDataWriter { public: typedef std::string LabelType; typedef unsigned int LabelIdType; virtual void Init(....) = 0; virtual void Init(....) = 0; virtual void Destroy() = 0; virtual void GetSections(....) = 0; virtual bool SaveData(....) = 0; virtual void SaveMapping(....) = 0; }; This is a definition of a base class, as we can see, it has virtual functions, but a virtual destructor is missing. m_dataWriter = new HTKMLFWriter<ElemType>(); Thus the memory is allocated for the object of the derived class "HTKMLFWriter". It's description: template <class ElemType> class HTKMLFWriter : public IDataWriter<ElemType> { private: std::vector<size_t> outputDims; std::vector<std::vector<std::wstring>> outputFiles; std::vector<size_t> udims; std::map<std::wstring, size_t> outputNameToIdMap; std::map<std::wstring, size_t> outputNameToDimMap; std::map<std::wstring, size_t> outputNameToTypeMap; unsigned int sampPeriod; size_t outputFileIndex; void Save(std::wstring& outputFile, ....); ElemType* m_tempArray;
  • 9. size_t m_tempArraySize; .... }; Due to the missing virtual destructor in the base class, this object will not be properly destroyed. For outputDims, outputFiles objects the destructors will also not be called. However, in general it is impossible to predict all the aftereffects, this is why we use the term "undefined behavior". Miscellaneous errors. V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '|' operator. sequenceparser.h 338 enum SequenceFlags { seqFlagNull = 0, seqFlagLineBreak = 1, // line break on the parsed line seqFlagEmptyLine = 2, // empty line seqFlagStartLabel = 4, seqFlagStopLabel = 8 }; long Parse(....) { .... // sequence state machine variables bool m_beginSequence; bool m_endSequence; .... if (seqPos) { SequencePosition sequencePos(numbers->size(), labels->size(), m_beginSequence ? seqFlagStartLabel : 0 | m_endSequence ?
  • 10. seqFlagStopLabel : 0 | seqFlagLineBreak); // add a sequence element to the list seqPos->push_back(sequencePos); sequencePositionLast = sequencePos; } // end of sequence determines record separation if (m_endSequence) recordCount = (long) labels->size(); .... } The priority of a ternary operator ':?' is lower than of a bitwise OR '|' operator. Lets' have a closer look at the fragment containing an error: 0 | m_endSequence ? seqFlagStopLabel : 0 | seqFlagLineBreak It is expected that only bitwise operations with specified flags will be executed, however, because of an unexpected execution order, "0 | m_endSequence" will be executed first, instead of "m_endSequence ? seqFlagStopLabel : 0 | seqFlagLineBreak". In fact, this is an interesting case. In spite of the error, the code works correctly. Bitwise OR with 0 doesn't affect anything. Nevertheless, it is better to correct the error. There are two more similar fragments:  V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '|' operator. sequenceparser.h 433  V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '|' operator. sequenceparser.h 598 V530 The return value of function 'size' is required to be utilized. basics.h 428 // TODO: merge this with todouble(const char*) above static inline double todouble(const std::string& s) { s.size(); // just used to remove the unreferenced warning double value = 0.0; .... } There is no error here, which we can see in the comment, but this example is given here for two reasons: Firstly, to disable a compiler warning, there is an UNREFERENCED_PARAMETER macro, whose name makes it clear that the parameter of the function is not used deliberately: #define UNREFERENCED_PARAMETER(P) (P) static inline double todouble(const std::string& s) { UNREFERENCED_PARAMETER(s); .... }
  • 11. Secondly, we would like to show another compiler warning that most likely indicates an error. V530 The return value of function 'empty' is required to be utilized. utterancesourcemulti.h 340 template <class UTTREF> std::vector<shiftedvector<....>>getclassids(const UTTREF &uttref) { std::vector<shiftedvector<....>> allclassids; allclassids.empty(); // <= .... } There is no point in non-use of the result of the empty() function. Perhaps the vector was to be cleared with the clear() function. A similar fragment:  V530 The return value of function 'empty' is required to be utilized. utterancesourcemulti.h 364 V688 The 'm_file' local variable possesses the same name as one of the class members, which can result in confusion. sequencereader.cpp 552 template <class ElemType> class SequenceReader : public IDataReader<ElemType> { protected: bool m_idx2clsRead; bool m_clsinfoRead; bool m_idx2probRead; std::wstring m_file; // <= .... } template <class ElemType> template <class ConfigRecordType> void SequenceReader<ElemType>::InitFromConfig(....) { .... std::wstring m_file = readerConfig(L"file"); // <= if (m_traceLevel > 0) { fprintf(stderr, "....", m_file.c_str()); } .... } Using variables of the same name in the class, class functions, and class parameters, is a very bad programming style. For instance: was the variable declaration "std::wstring m_file = readerConfig(L"file");" supposed to be here, or was it added temporarily for debugging and then, was left forgotten? Developers should also review the following fragments:
  • 12.  V688 The 'm_file' local variable possesses the same name as one of the class members, which can result in a confusion. sequencereader.cpp 1554  V688 The 'm_mbStartSample' function argument possesses the same name as one of the class members, which can result in a confusion. sequencereader.cpp 2062  V688 The 'm_file' local variable possesses the same name as one of the class members, which can result in a confusion. lusequencereader.cpp 417 Conclusion: Computational Network Toolkit (CNTK), being a relatively small project, turned out to be a pretty interesting piece of software. As the CNTK project has just recently been made open, we are looking forward to seeing new ideas for its usage, and of course, other open source projects by Microsoft.