The ASIC flow is way more involved than the FPGA flow, you need to think about all sorts of other things, like what pads you want on your chip, how you are going to generate clocks, floor planning, power distribution, test, and a whole raft of other issues. Going from RTL to GDS2 for even a simple chip would take 3 months work as an absolute minimum (and that's just to get it ready to send to the fab. You then have a whole lot of work when it comes back (you need to have boards designed for testing the chip).
While this is true, this work is typically done by a different person than the one designing the RTL, and it takes 1-2 month at most if there are minimal changes to the overall physical design. We are able to iterate on designs at a rate of ~6 month with a team of ~10 people.
In my day job as an FPGA developer I am using totally open source tools for simulation and verification (the only tool we pay for is Vivado but you have to pay for that in order to build for the FPGA). The simulator is icarus verilog and does the job pretty well. I use cocotb for simulation.
I disagree about for loops, you actually end up using these quite a lot in vhdl/verilog (with understanding about what logic you are going to end up with), if you want to do the same operation on multiple things:
input [NUM_OF_MULTIPLIERS*32-1:0] a_in,
input [NUM_OF_MULTIPLIERS*32-1:0] b_in,
output [NUM_OF_MULTIPLIERS*64-1:0] mult_out
reg [31:0] tmp_a, tmp_b;
reg [63:0] tmp_mult;
always @(*) begin
mult_out = {(NUM_OF_MULTIPLIERS*64){1'b0}};
for (i=0; i<NUM_OF_MULTIPLIERS; i+=1) begin
tmp_a = a_in>>(i*32);
tmp_b = b_in>>(i*32);
tmp_mult = tmp_a*tmp_b;
mult_out |= tmp_mult<<(i*64);
end
end
Would give you NUM_OF_MULTIPLIERS multipliers. If you wrote each multiply out, it would be more code and also wouldn't allow you to parametrize the code.
The key is that for loops are essentially pre-processor macros (like C) so they must have a fixed number of iterations known at compile time. So yes, you have a for loop, but it's very different to what you expect from a for loop in software.
Yes, the key is that loops are always unrolled so the number of iterations (number of copies of the hardware) is fixed. But whether the output of each iteration is used or not can be entire dynamic, potentially resulting in something very similar to a loop in software.
I use Icarus Verilog at work for a fairly complex trading system on an FPGA so I disagree that FOSS simulators are almost totally useless. It supports most System Verilog features and works well with cocotb. In fact the fact that it’s open source also allows me not to have to worry about license usage (which has always been a problem using modelsim). I managed quite well abstracting the Xilinx is (most up like rams can be inferred in the code) and things like pcie, transceivers, ddr4 have well defined interfaces so are easy to model in straight Verilog
I use it exclusively. I know of other trading firms that use verilator too. To be honest, no matter how big the company, how deep the pockets, theres still going to be a finite amount of questa licenses available. If you use Icarus Verilog it allows you to farm out simulations to anywhere, run as many in parallel, which would not be possible with questa (as you would eventually run out of licenses). Also, I think icarus verilog actually works pretty well, it covers enough of the system verilog syntax to be useful for RTL and with cocotb I don't need access to the system verilog testbench stuff.
I disagree with this, you don't necessarily need a whole team of people and massive amounts of cash to do FPGA development and you don't necessarily need expensive tools. For my current company I created a complete FPGA based trading system from scratch on my own with free tools (apart from Vivado which I just used to turn my RTL into an actual design I could put onto the FPGA board). The board I used cost around £2k and the Vivado tools were £4k (athough if I was going to do it again, it appears you can just pay for your usage of Vivado using the cloud (nimbix has machines that have the Vivado suite on them). The cost to the company for this is pretty much my salary + the board costs.
I've had a different experience to this. I've worked on ASIC's for over ten years and have had experience with nearly all aspects of the design flow (from RTL all the way to GDS2 at one point or another). I've taped out probably 20+ chips (although I've been concentrating on FPGA's for the last three years). Every chip that I've taped out has had extensive FPGA prototyping done on the design. This is in a variety of different areas too (Bluetooth, GPU's, CPU's, video pipelines, etc). You can just get a hell of a lot more cycles through an FPGA prototyping system than you can an RTL sim and when you are spending a lot of money on the ASIC masks, etc you want to have a chance to soak test it first.
My experience agrees with yours. Many big-budget teams use a hardware emulator like the Palladium XP or the similar Synopsis device. Both built from FPGAs.
Hardware emulators are expensive, but a single mask respin at 7, 10, or 16nm is even more expensive.
There is a distinction between hardware emulators and FPGAs.
Though hardware emulators such as Palladiums may use FPGAs inside them they don't work the same way in terms of validation. The two tools are very different to use.
I think you can treat 60 - 70% of the FPGA design flow as open source. For example, I am developing a system using PCIe, 10G Base T and some logic to send and receive network packets and to design the HDL and test it, I am using two open source tools predominately (icarus verilog and cocotb). I just use the FPGA P&R tools for building the design once I am satisfied it works. You can also run these tools on the command line quite easily and automate most of the process (They all use tcl for scripting up the flow). Sure theres a few FPGa specific interfaces you have to deal with (transceivers, DDR4, pcie hard ip) but you can pretty much traet these as black boxes and write your tests to target the interfaces in and out of the logic. Also, for things like transceivers, the interface is really not that different between Xilinx and Altera (I treat them as a black box that generates 32-bits every 322MHz cycle for 10G-Base T).
The flow to my mind is not that disimilar to a traditional software development flow. I have simulation tests and test cases, I use continous integration to run tests everytime something is commited, everytime I build the FPGA with the P&R tools, I kick off hardware tests automatically, etc
In actual fact for Xilinx based FPGA's this is quite straightforward. An example for 10G Base R:
You can get Xilinx's component for their pma/pcs for 10g base-r ethernet for free from vivado and stick one of the macs from open cores on the end of it (probably this: http://opencores.org/project,xge_ll_mac - I used it for prototyping and it seems to work (before creating my own pcs/pma block and mac to cut down the latency)
Once you have that, then you would need to deal with the ethernet frames streaming through the FPGA, probably 64-bits at a time at 156MHz for 10G, so you need to pull out the fields you are interested in (like mac addresses, ip addresses, etc). You can buffer the incoming packet into a FIFO whilst waiting for the stuff you want to filter on. Once you have all your fields you can decide whether you want to pass the packet through to the tx side or not (I usually read the packet out of the FIFO either way and just hold the valid low for packets I don't want to send).
Xilinx provide software drivers and IP for PCIe DMA and memory mapped interfaces. These are fairly easy to integrate (probably not the best for latency though - I've developed my own but I require a specific use case - low latency but don't care about bandwidth).
Nowadays, I don't believe you need a paid-for simulator like Questa, VCS, etc. I am developing verilog in my day job for FPGA's using icarus verilog (an open source simulator)which works fine for fairly large real world designs (I am also using cocotb for testing my code) and supports quite a lot of system verilog too.