Difference between revisions of "Cpp/MemorySafety1"

From PeerFreedom Wiki
< Cpp
Jump to navigation Jump to search
Line 12: Line 12:
   
 
<syntaxhighlight lang="C++">
 
<syntaxhighlight lang="C++">
auto * buf = new std::byte[256]; // C++17
+
auto * buf = new std::byte[1024]; // C++17
auto * buf = new stdpfp::byte[256]; // C++14
+
auto * buf = new stdpfp::byte[1024]; // C++14
auto * buf = new unsigned char[256]; // allowed too
+
auto * buf = new unsigned char[1024]; // allowed too
 
</syntaxhighlight>
 
</syntaxhighlight>
   

Revision as of 13:24, 2 August 2020


This article is about how to read/write memory "as bytes". For related but separate subject of converting integer (values) see cpp/IntegerSafety1.

Summary

Summary on buffers

  • Allocate memory (buffers) using arrays of: unsigned char / std::byte (C++17) / enum_byte stdpfp::byte (C++14)
auto * buf = new std::byte[1024]; // C++17
auto * buf = new stdpfp::byte[1024]; // C++14
auto * buf = new unsigned char[1024]; // allowed too
  • Access the stored value of an object (to read/write) using pointers or references to "membyte" types, especially unsigned char (C++14) / std::byte (C++17) ; but NOT through enum_byte like stdpfp::byte.
std::byte * p = buf+10;  *p = 0xFF; // C++17
stdpfp::byte * p = buf+10; /* DO NOT DEREFENCE IT!  *p = ... ; not allowed! */  // C++14
unsigned char * p = buf+10;  *p = 0xFF; // C/C++
// using just "char" is also allowed but probably no reason for it when working with bytes
  • Access individual single byte variables after static_cast with: stdpfp::byte(C++14) / std::byte (C++17) / unsigned char (always allowed).
unsigned char * p = buf+10;
std::byte value ( *p );  value &= 0xF0;  *p = static_cast<unsigned char>(value); // C++17
stdpfp::byte value ( *p );  value &= 0xF0;  *p = static_cast<unsigned char>(value); // C++14
unsigned char value ( *p );  value &= 0xF0;  *p = static_cast<unsigned char>(value); // allowed too

This results from rules that: we must access data as the special "membyte" types [1]) and access the memory as *(P+i) for i-th byte [2].

Longer example showing proposed correct use: https://wandbox.org/permlink/aPPgQTSjDl0WmHjB = https://git.peerfreedom.org/blacksmith/cpp-snipets/-/blob/master/byte-memory-enum-2.cpp

Summary of other objects

To work with other trivially-copyable objects (read/inspect, or write them) as bytes : you must first copy them using std::memcpy [3] (and write it back if writing) and then read it as buffer of bytes.

But in many cases such accessing of storage is not the best way to serialize/de-serialize.

Levels

The overview of rules where we use which byte-like types.

Creation: use new stdpfp::byte[];  // or std::byte in C++17
Holding buffer: use stdpfp::byte* m_buff;  // or std::byte in C++17
Passing buffers to other functions void func( stdpfp::byte* ) // or std::byte in C++17

Passing to C-style function like sodium: reinterpret_cast to unsigned char*
Working yourself on memory: reinterpret_cast to unsigned char* ptr; Use it - *ptr=xxx;
Working on 1-byte values: you can use stdpfp::byte value;  value = *ptr; value &= 0xFF;

Warnings

When you create stdpfp::byte* then remember you are NOT allowed to directly use it (access the stored memory, dereference), you must first reinterpret_cast to unsigned char* (or std::byte* in C++17, or possibly plain char* but discouraged).

Same warning applies to references, besides pointers.

Our definitions

  • membyte - is our name for a type T that is like a byte, and is a special "magical type" - can be potentially use to read any of the memory (of other objects values).

This are: unsigned-char, char, std::byte(since C++17). NOT signed-char, and NOT enum nor stdpfp::byte. In the pfp-cpp library, test whether T meets this concept is done by stdpfp::is_bytemem<T>

  • byteequiv - is our name for a type that is also a type like a byte - but does not necessarily allow reading of memory as the magical membyte allows. Such type can represent 256 values like unsigned char (under assumption that chars are 8-bit, which is in practice true on all normal platforms, and is guaranteed from AFAIR cpp2017).

In the pfp-cpp library, test whether T meets this concept is done by stdpfp::is_byteequiv<T>

(This are our names invented here, and not names from C++ Standard)

Accessing byte buffer

#include <iostream>
int main() {
    char buf1[2] = { '\3', '\10' }; // this is octal, \10 is 8.
    // but in real code better use buffers of unsigned char.
    {
        auto & buf = buf1 ;
        auto P = reinterpret_cast<unsigned char*>( & buf[0] );
        for (size_t i=0; i<sizeof(buf); ++i) {
            unsigned char thebyte = * (P+i) ;
            std::cout << static_cast<unsigned int>( thebyte ) << " ";
        }
        std::cout << "\n";
    }
}

code also on https://wandbox.org/permlink/3X3n2iD5UvAAQNqF and output is: 3 8.

To inspect (or write into) memory of a byte buffer (defined below), we can reinterpret it as pointer P of any membyte type and then read (*(P+i)) for i-th byte. The buffer must be an array as defined in dcl.array of a type X that should be any byteequiv or at least should have sizeof(X)==sizeof(char).

  • if type X would be too big, that is sizeof(X) > sizeof(char) then by reading as-membyte the P+i, we will skip over some of the memory of X (read only first byte of each X)
  • no type X can have smaller size (sizeof()) than char, probably
  • therefore, we can use this method to inspect (and also write into) all combinations of following:
    • dynamic array allocated by new[], a c-array, a vector<X> and array<X> by accessing its .data(), and all "byte buffers" from various libraries that are allocated as one of methods described here (so e.g. also boost::asio buffers) ...
    • ... consisting of elements of type X where this type is byteequiv: so X can be any char (plain, unsigned, and also signed char), byte
      • type X could be other trivially copyable type of same size as char, assuming the used container does not treat it in unusual way
      • type X could also be bool but then it can not be used with e.g. vector, as vector<bool> doesn't return a proper .data (since it packs the bool and they are not individually addressable)
      • if type X is not a byte, is bigger, then read the individual elements as objects (see other chapter here)

For some examples:

  • read dynamic array[] of char, by reading its bytes as char - ok
  • read dynamic array[] of char, by reading its bytes as unsigned char - ok
  • read dynamic array[] of char, by reading its bytes as signed char - this is NOT ALLOWED
  • read dynamic array[] of signed char, by reading its bytes as signed char - allowed but just because no conversion happens at all, same type
  • read dynamic array[] of unsigned char, by reading its bytes as char - ok
  • read dynamic array[] of signed char, by reading its bytes as char - ok, but implementation-defined what the resulting bytes will be exactly
  • read dynamic array[] of signed char, by reading its bytes as unsigned char - ok, but implementation-defined what the resulting bytes will be exactly
  • read dynamic array[] of signed char, by reading its bytes as std::byte - ok, but implementation-defined what the resulting bytes will be exactly
  • all above examples are same for other container types; But to give more examples:
  • read c-array of char, by reading its bytes as char - ok
  • read c-array of char, by reading its bytes as unsigned char - ok
  • read c-array of char, by reading its bytes as std::byte - ok
  • read .data of vector<char>, by reading its bytes as char - ok
  • read .data of array<char,N>, by reading its bytes as char - ok

This is legal, because converting pointer from T1 to T2 might be legal (http://eel.is/c++draft/expr.reinterpret.cast#7), and: (TODO) (converting pointer to any type into pointer to byte-like types is legal).

Accessing any object

If we do NOT want to just access bytes buffer:

To inspect memory of any (trivially-copyable) type T, we shall first copy its memory aside using std::memcpy, and then read resulting memory as objects of any membyte type, read (*(P+i)) for i-th byte as described in chapter regarding accessing byte buffer.

  • However all reading/writting of objects is often implementation-defined (e.g. endianness for all integer bigger than char,

and sign-representation for all signed integers) - so usually you do not want to use this to serialize/deserialize objects (unless clearly defined objects, e.g. bit fields).

  • The memcpy is required to be sure you are fully compatible with C++ standard (actually the standard is not 100% clear on this, some say); Though some implementations (compilers/platforms) might allow to skip this copy potentially.
  • Reading and writing memory of objects that are not trivially-copytable is generally UB, do not do it.

Converting between types

  • http://eel.is/c++draft/basic.lval#11.3 - we must read as "a char, unsigned char, or std​::​byte type" (but not as signed char!); C++14 https://timsong-cpp.github.io/cppwp/n4140/basic.lval#10.8
  • http://eel.is/c++draft/expr.add#4 - to access i-th element of array; see http://eel.is/c++draft/dcl.array for what is an array
  • http://eel.is/c++draft/basic.types#2 - we can read any trivially-copyable type T after we std::memcpy it