One has shape (N, 1) and one has shape (N). Always having a trailing 1 is a pain in many settings. Matlab has that “trailing 1” all the time and it’s a PITA.
There are many reasons the "-by-1" doesn't make sense.
* If a vector is an n-by-1 array (a "matrix"), why not an n-by-1-by-1-by-1-by-1?
* What is a row vector in something like a function space? Is sin(t) a row or column vector?
* You _can_ think of a matrix as acting "to the right (on a column vector)" and "to the left (on a row vector)", but this disrupts the notion of a matrix as a linear function acting on a thing (since it can act on two things). It's much more consistent to define a matrix acting on a vector, m(x), independent of "direction" and then define the transformations that let you go in the opposite direction (take the transpose of m which is a whole new matrix).
* Are physical quantities like velocity a row or column vector? Convention dictates they're column vectors, so in what scenario would it become a row vector? If I have a velocity, v1, that's a row vector and another, v2, that's a column vector, is it appropriate to calculate the projection of v1 onto v2? Should the projection operation be "aware" of the orientations? If all vectors are the same ("column vectors"), this isn't a problem -- you can always define the projection operation without special cases.
* In matlab specifically, if you have an n-by-1 array `a` and a 1-by-n array `b` with the same elements, then `a(i) == b(i)` but `a` and `b` are not the same thing.
Sadly you can't synthesize any old piece of C++, there are a ridiculous amount of constraints and you need to write the code in a very specific way.
Also there's not necessarily much of a performance gain, some tasks are not good candidates for being implemented in hardware (as a general rule you need a lot of parallelism to make it worth it). As an overly simple example, implementing hardware RSA and hoping for a significant speedup doesn't make sense because there isn't really much parallelism and it's usually only used to encrypt keys, but something like AES or SHA might benefit from a good hardware implementation because there is much more parallelism to be had and they are used to encrypt much larger amounts of data.
To add even more complexity, the compilers can be obscenely finicky with optimizations.
So, it's less that you're writing C++ and synthesizing it, and more like you're maintaining a VHDL codebase that happens to be presented/edited "through" C++.
That's a fair description. Even so, with VHDL being rather verbose you might prefer the "C++ skin on VHDL" version for some things, especially algorithmic things.
Definitely, some code is much more readable/clean in C++, while still compiling to reasonable VHDL. Also templates let you create fairly complex blocks programmatically at compile-time, Verilog doesn't have the same metaprogramming facilities that C++ offers.
The technology has been around for a while, but as the other commenter noted "caveats" covers a whole lot of annoying/frustrating things. It's getting better, though.