I just got it to install git and clone (the non existent) repo https://github.com/openai/assistant, and am now browsing it’s own interpretation of a repo with a lot of python code, including directories like “training”, “output”, “parsing” and with files with content like this:
import json
from collections import Counter
from typing import Any, Dict, List, Optional, Tuple
import numpy as np
from openai_secret_manager import get_secrets
from assistant.constants import MAX_OUTPUT_LENGTH
from assistant.utils.string_utils import strip_html
from assistant.utils.text_utils import split_text_into_lines
class Output:
def __init__(
self,
generated_text: str,
response: Optional[Dict[str, Any]] = None,
score: Optional[float] = None,
):
self.generated_text = generated_text
self.response = response or {}
self.score = score
On a side note it feels like each command takes longer to process than the previous - almost like it is re-doing everything for each command (and that is how it keeps state).
>On a side note it feels like each command takes longer to process than the previous - almost like it is re-doing everything for each command (and that is how it keeps state).
That's because it's probably redoing everything.
But that's probably to keep the implementation simple. They are probably just appending the new input and re-running the whole network.
The typical data dependency structure in a transformer architecture is the following :
outputt0 outputt1 outputt2 outputt3 | outputt4
featL4t0 featL4t1 featL4t2 featL4t3 | featL4t4
featL3t0 featL3t1 featL3t2 featL3t3 | featL3t4
featL2t0 featL2t1 featL2t2 featL2t3 | featL2t4
featL1t0 featL1t1 featL1t2 featL1t3 | featL1t4
input_t0 input_t1 input_t2 input_t3 | input_t4
The features at layer Li at time tj only depends on the features of the layer L(i-1) at times t<=tj.
If you append some new input at the next time t4 and recompute everything from scratch it doesn't change any feature values for time < t4.
To compute the features and output at time t4 you need all the values of the previous times for all layers.
The alternative to recomputing would be preserving the previously generated features, and incrementally building the last chunk by stitching it to the previous features. If you have your AI assistant running locally that something you can do, but when you are serving plenty of different sessions, you will quickly run out of memory.
With simple transformers, the time horizon of the transformer used to be limited because the attention of the transformer was scaling quadratically (in compute), but they are probably using an attention that scale in O(n*log(n)) something like the Reformer, which allows them to handle very long sequence for cheap, and probably explain the boost in performance compared to previous GPTs.
GPT-3 cannot run on hobbyist-level GPU yet. That's the difference (compared to Stable Diffusion which could run on 2070 even with a not-so-carefully-written PyTorch implementation), and the reason why I believe that while ChatGPT is awesome and made more people aware what LLMs could do today, this is not a moment like what happened with diffusion models.
What makes you say this? Rerunning the whole, which it appears they’re doing, is to prevent the need to hold onto state, so memory is not used. In other words, they’re not having this problem because they’re not doing it that way.
If you generate only a single timestep, during inference when recomputing you can compute layer by layer, you don't need to preserve the features of the previous layers as the layer only depend on the layer immediately below. So your memory need don't depend on the number of layers.
But typically in a standard transformer architecture, you usually generate multiple timesteps by feeding sequentially the output as an input to the next timestep so you need to preserve all the features to not have to recompute them at each timestep. So your memory depends again on the number of layer of your network.
But if you are memory constrained, you can modify your architecture a little (and the training procedure) to put yourself back in the first situation where you only generate a single timestep, by extracting with the transformer a context vector of fixed size by layer for all the past (including your most recent input prompt), and you use another transformer to generate the word in sequence based on this context vector.
In my experience, you can get it to change its mind by troubleshooting the connectivity issues. E.g. if you use dig to get the ip and then ask curl to use that ip instead of a dns lookup, then it works for me.
I did `curl icanhazip.com` and it spit out the "local" private IP. I told chatgpt that icanhazip would never do that, and it revised the answer to 37.48.80.166, which is an IP owned by LeaseWeb.
OK, fair enough! But it would be interesting to add the link with the real Internet in the next release. Sadly, the model’s global state is not immediately updated, there are snapshots… but I think it would be interesting to watch it conversing in real here on Hacker News.
Why do you think this? I don't think there's any reason it would be able to reproduce its own code. It's never seen it so it's not in the weights, and it doesn't have that type of reflection so it can't look it up dynamically.
ChatGPT output:
"I am not sure which specific programming languages or libraries were used to train my language model, as I do not have access to that information. Language models are typically trained using a combination of various programming languages and tools, and the specific technologies that are used can vary depending on the specific model and the research team that developed it. I am a large language model trained by OpenAI, and I use artificial intelligence (AI) and natural language processing (NLP) techniques to generate responses to text-based queries."
Can’t help you with the keys or ID (yet), but I exclusively use the stored cards on my Apple Watch for payment. It is so reliable (in Norway) that I haven’t brought my wallet on normal days in 2+ years.
Even on vacation in Northern Europe (Belgium, Netherlands, France, Germany) and on a business trip to the US (California+Texas) this year, I very rarely had to use the physical cards. NFC just works. Everywhere.
I still bring the cards on important occasions or when going further than a normal drive, though - a testament to the fact that the day you’re longing for is not _quite_ here yet.
This one? I remember it as an earnest description of the difficulties the WSL team had with the speed of NTFS - and I think it was one of the reasons for the switch to virtualisation in WSL2.
My takeaway from that comment is that there are some important performances issues that apply generally to all filesystems on Windows. Maybe we can partially test whether that's the case by playing with WSL1 on ReFS, ExFAT (if that's even supported, with its limited permissions support, or ZFS, once OpenZFS on Windows stabilizes a bit.
When it comes to the problem of targeting, one interesting and promising tech is photochemical internalisation [https://en.m.wikipedia.org/wiki/Photochemical_internalizatio...], where you put the mRNA inside photosensitive molecules (and not lipids) and then shine some light on the tissue/organ where you want the mRNA delivered. Where activated by the light, the molecules then enter the cells, dissolves and deliver the mRNA.
The Norwegian company PCI Biotech has a tech they call fimaNAc for doing this with naked mRNA.
What happens to the unactivated mRNA in that case? I was under the impression that it was typically the actual use of the mRNA by the ribosomes that broke them down generally, but I could be off base there.
Sure, it isn't destroyed immediately and the mRNA is used multiple time, but that's still ultimately the ribosomes damaging the mRNA from use. My question is around how they break down if there aren't ribosomes involved (ie. if the capsules above aren't opened because they weren't exposed to the light trigger).
I've had 920, 930 and I'm now using a 950 as my main phone: The hardware is superb, and I really like the OS (running Insider Preview Slow Ring). There has been a steady stream of Windows 10 Mobile Insider Preview updates throughout the last year, bringing both new features and stability.
In user interaction and interface consistency it is now much closer to iOS (or what iOS tries to be) than Android is.
That being said, it is pretty obvious it is a minuscule platform; apps are often lagging behind their iOS/Android counterparts, and there are some obvious ones missing (like Snapchat and Pokemon Go).
It is kind of sad really; I think it would be healthy with more than two major players, and Windows 10 users will probably feel quite at home in Windows 10 Mobile.
To the people suggesting ELK i just want to ask if you have actually used it in production? Like for real bughunting and investigating support requests?
As much as we absolutely love ElasticSearch for our other indexing needs, we find it quite hard to get the LK-part of the stack to deliver as promised. Kibana may serve up nice graphs and charts, but when you need to drill down into a large amount of log data, we often feel like loosing both overview _and_ detail.
It might very well be that we are to blame, and that we are just doing it wrong (tm) - but I would love to hear how other people are leveraging the ELK stack in production environments?
We use ElasticSearch and Kibana in production for real bughunting and support requests. Logstash was too frustrating to deal with so we wrote our own simple wrapper around an open source ElasticSearch client library to log ourselves.
We log every request (everything but the body usually) and response. If an error occurs, its logged as part of the request. We can practically replay actions taken by users and easily drill down to the exact requests pertaining to an error.