For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | arkohut's commentsregister

Thanks for sharing that.They are quite similar. But I found that this project is focusing on searching text. Pensive also enable vlm integration which can search contents without text.


I gave a bad name "memos" for the project. But there is a great open source project named "memos" over there. So I quickly changed the name to "Pensieve". Sorry for giving such confusing...


Herein lies the paradox: I want a tool that helps me record more information, but I don’t want this information to be easily exposed to others or used as evidence against me for things I’ve done. Yet, there are moments when I genuinely need to share this information—to prove what I have or haven’t done. The critical bottom line, however, is that the records must remain untampered. If I could alter them at will, their value and meaning would be entirely lost.


> but I don’t want this information to be easily exposed to others or used as evidence against me for things I’ve done

Even if you wrote it in a diary it could still be used as evidence against you. The only untouchable place is your mind.


In fact, this project is indeed very computing-consuming, but it’s not Python’s fault. The main reason lies in the use of several machine learning models:

  1. OCR model
  2. Embedding model
  3. VLM model (optional)
I’ve tried many optimization approaches to ensure it doesn’t affect daily usage, though this comes at the cost of reduced search performance.


It is just a tool...Delete them if you wanna forget. And although I don’t have a photographic memory, there are still some things I can never forget, whether they’re good or bad.


Thanks for the advice. I will work on this feature.


Let me share a scenario I personally encountered. One day, I came across an introduction to a Star Wars animation on a video recommendation website. I was drawn to the fancy covering image, which briefly flashed by in the homepage slides. However, I got busy with other things and only had time to search for it later that evening. Without this project, such information would have been almost impossible to find back because, as part of the website’s recommendation system, the content changes every time you refresh the web page. But this time was different, I was able to retrieve the content by searching for the keyword “Star War” and narrowing the search by time range.

Of course, I know that such a feature might seem trivial. Some things are simply forgotten and that’s fine. But what if it is a more important clue, like a bug for the web site which only trigger in some narrow condition and hard to reproduce.


Sorry for the confusing. I gave a bad name "memos" for the project. But there is a great open source project named "memos" over there. So I quickly changed the name to "Pensieve".


Thanks for the advice I will do more about this part. Currently I am using a package named "ocrmac" it helps a lot.


Memos is a privacy-focused passive recording project. It can automatically record screen content, build intelligent indices, and provide a convenient web interface to retrieve historical records.

This project draws heavily from two other projects: one called Rewind and another called Windows Recall. However, unlike both of them, Memos allows you to have complete control over your data, avoiding the transfer of data to untrusted data centers.


> avoiding the transfer of data to untrusted data centers

In short order, this will create a large corpus of unsecured local data.

Is the user expected to secure the data independently?

Do Recall/Rewind help the user to filter recorded data for retention or deletion?


Rewind and Recall also store similar data locally but maybe not only locally. And Recall/Rewind allow data deletion, they can retain the most recent data based on time.


Rewind and Recall are 2 separate projects and 2 separate installers. I use Rewind and I have several outbound network monitoring apps as well as local disk monitoring apps. Rewind does not send data offsite.

Rewind does glitch sometimes specifically with audio recording which is extremely annoying. You go back to an area where you thought you had audio notes only to find you didn’t - even though you had audio recording turned on the whole time. It has something to do with meeting detection. Which is silly bc disk space is cheap just auto record. I do like the concept of an open source version and I will look into this.


Thanks to PR debacle, Recall now encrypts the data in a VM, https://www.windowscentral.com/software-apps/windows-11/wind...


If this is very important, I suppose I will implement encryption for stored data in future versions.

However, I still have a question about this: it seems that lots of hard disk is already encrypted. After all, I also store a large amount of personal photos, documents, bills, and other important information on my computer, and I haven’t meticulously encrypted all this data again. Should I be doing that?


It’s a question of risk.

Full disk encryption targets a different threat model - disk encryption protects against someone grabbing your computer.

Writing into an encrypted blob on disk adds a layer of protection against bad actors exfiltrating data by running code on the laptop.

Overall I really am amazed that this sort of thing is now possible and appreciate a privacy-aware / local compute and storage version of it!


This is odd. Why would you secure this piece of data and leave everything else open?

Surely you encrypt your disk rather than trying to secure this one app? I mean there’s far more valuable stuff to on your machine than anything this app could possibly store


Things that are fleetingly on-screen are not commonly stored on disk forever. That changes with these apps.


You can't have it both ways. You can either own your data and secure it yourself or you can entrust it to someone else and hope they don't leak it (they will). A lot of the data is already stored in your computer anyways, such as your browser history.


"large corpus of unsecured unsecured local data" is this much worse than unencrypted outlook mailbox (pst or est)? Or offline files from your Dropbox/GDrive/etc? Or your browser profile?

I guess it's worse in the sense that it also records audio, but large corpus of information is already at risk on a unsecure or compromised devices


I don’t record audio because I believe this is already a built-in feature in many meeting software applications.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You