I want a thing that I think should surely exist, but I can't find it.
I want a LAMP-stack (or similar) web app that's open source, that I run on my own server,
that I can use as a proxy server in any browser,
that gives me some sort of "save this" button (via injection to the HTML or better a bookmarklet),
that tells the app to save a copy of what it just proxy-served me into my saved files database,
which can handle documents that are HTML pages and PDF (and PPT and DOC and JPG and PNG and GIF and little pink ponies, while asking for the moon),
and which has a web interface with a basic reference-style metadata thingy, such that I can capture (manually entered, if necessary!) intrinsic document metadata (URL where I found it, authors, date, publication/journal) and annotative metadata (my own tags, "folder" assignments, summary, other comments),
and supports full-text search of HTML and text-based PDFs.
Put another way, I want a caching bookmark manager that supports a reference manager interface and functions as a proxy so it seamlessly has my authentication bits.
I have a caching bookmark manager, but it doesn't do PDF (or anything other than HTML) and doesn't manage references, and there's apparently reference manager software that does PDFs (but not HTML) and doesn't function as a proxy unless you go through a hosted solution on someone else's server.
And there are all these "reference manager" web applications which live on someone else's server, which is unacceptable. There are allegedly reference managers for the desktop that at least do PDF, but none of them work for an OS as old as I am on (also your references are stuck on your computer).
These are the workflows I want it to support: (1) I roam around the internet, reading things, and clicking "save to my library", which captures what page or document I'm on and prompts me with a bunch of form fields to add additional info. (2) I can go to the main URL of my library, authenticate, and then access a web app that is basically a reference manager that supports projects and tagging, allowing me to call up a list of all the docs/urls/refs I classified to a certain thing. (3) extra credit: I can go to my library, authenticate, then do a full text search of my documents. Honestly, basic grep would be fine.
Does anybody know if the thing I want (need) exists?
ETA: User review of the day:
I want a LAMP-stack (or similar) web app that's open source, that I run on my own server,
that I can use as a proxy server in any browser,
that gives me some sort of "save this" button (via injection to the HTML or better a bookmarklet),
that tells the app to save a copy of what it just proxy-served me into my saved files database,
which can handle documents that are HTML pages and PDF (and PPT and DOC and JPG and PNG and GIF and little pink ponies, while asking for the moon),
and which has a web interface with a basic reference-style metadata thingy, such that I can capture (manually entered, if necessary!) intrinsic document metadata (URL where I found it, authors, date, publication/journal) and annotative metadata (my own tags, "folder" assignments, summary, other comments),
and supports full-text search of HTML and text-based PDFs.
Put another way, I want a caching bookmark manager that supports a reference manager interface and functions as a proxy so it seamlessly has my authentication bits.
I have a caching bookmark manager, but it doesn't do PDF (or anything other than HTML) and doesn't manage references, and there's apparently reference manager software that does PDFs (but not HTML) and doesn't function as a proxy unless you go through a hosted solution on someone else's server.
And there are all these "reference manager" web applications which live on someone else's server, which is unacceptable. There are allegedly reference managers for the desktop that at least do PDF, but none of them work for an OS as old as I am on (also your references are stuck on your computer).
These are the workflows I want it to support: (1) I roam around the internet, reading things, and clicking "save to my library", which captures what page or document I'm on and prompts me with a bunch of form fields to add additional info. (2) I can go to the main URL of my library, authenticate, and then access a web app that is basically a reference manager that supports projects and tagging, allowing me to call up a list of all the docs/urls/refs I classified to a certain thing. (3) extra credit: I can go to my library, authenticate, then do a full text search of my documents. Honestly, basic grep would be fine.
Does anybody know if the thing I want (need) exists?
ETA: User review of the day:
ander1122 Posted 09/27/2012Duly noted. Since I've already succumbed to a tawdry researcher lifestyle, supporting my academic journal article habit by selling my intellect on internet streetcorners, I suppose I have nothing let to lose.
★★★★★
I seldom had to create any academic literature in my career as a designer of miniature golf courses. Then I tried this app, and it was so fantastic that I changed my entire career course just so I could use it. That's how good it is. I suppose it'd be irresponsible of me not to warn you that, if you currently aren't called upon to prepare academic literature, and you're not prepared to change careers, you'd best avoid this app, despite its wonderfulness. Therefore, I shall: Please reread the preceding part of this paragraph, beginning with "...if you currently aren't called upon..." Anyway, this app is great. This concludes my review. Feel free to add your own citations.
(no subject)
Date: 2018-03-13 09:19 pm (UTC)How essential is the proxy, beyond "I want a copy of what was actually served, especially when paywalls/auth are involved"? Is it also important that it be usable from an arbitrary proxy-capable browser that can't have e.g. the Zotero connector?
(Zotero's collaboration service also has a web UI, but I'm not sure that's relevant here.)
(no subject)
Date: 2018-03-13 09:43 pm (UTC)(1) browser compatibility - I'm on a Mac and behind some version because it's a very old Mac, and
(2) browser bloat – part of why I want it on some other machine (i.e. a server) is to reduce the load on my device. Slinging PDFs taxes my machine already and trying to open up some additional specialty software – whether a desktop app or an extention in my browser – makes the problem worse not better.
(no subject)
Date: 2018-03-13 10:58 pm (UTC)OK, so this clears up why a proxy is of interest -- you want to offload compute onto your server. I feel like that's going to be hard because the proxy is going to have a hard time knowing what network requests are relevant, unless of course it runs a headless browser. It would also need a TLS MITM mechanism...
No idea if Zotero would be too much bloat on your machine. I'm on a fairly old Thinkpad laptop, and running Debian stable, which isn't the freshest thing in the world itself. Zotero doesn't cause noticeable CPU load when I'm not dumping stuff into it. 170 MB of RAM if I'm understanding shared vs. resident memory correctly.
(no subject)
Date: 2018-03-13 11:14 pm (UTC)? Are you seeing something I'm missing? I was thinking something like a regular proxy server, which caches everything temporarily, so if I go to foo.bar/bem/quux, my proxy server has a copy of foo.bar/bem/quux and all its related assets, and then if I click my bookmarklet while viewing foo.bar/bem/quux in my browser, the bookmarklet notifies my server, "save foo.bar/bem/quux and its assets into my library, and add this metadata". That sounds simple enough I'm tempted to write it myself.
Were you thinking about the problem of related assets?
It would also need a TLS MITM mechanism
Yes, but I was under the impression that was a solved problem because proxies exist. Is it not? Or is it very hard?
(no subject)
Date: 2018-03-13 11:34 pm (UTC)It gets harder if there's content loaded from ajax requests and whatnot, which is distressingly common. Could even be from third-party hosts. At that point, the bookmarklet would not have sufficient information to provide the server. Even if the server grabbed all things requested after the initial page load, assuming they're related, this would not detect objects served from the browser cache. (e.g. the second time you viewed the same page.) Cross-domain ajax requests already in the cache would be lost. I don't know how common that would be, but it would certainly suck to find out the hard way.
Zotero saves the *rendered* page, which is an entirely different approach. Maybe the bookmarklet could do that too, but then I'm not sure how different it is from the Zotero connector, besides the option to have the receiving software running on a different computer. (Fun thing for me to maybe try later: See if I can use port-forwarding so the standalone Zotero is running on a different computer than my browser. It appears to listen on three ports.)
On proxies, I wasn't trying to imply a huge technical challenge. The issue is that you'd need to generate a root certificate, and then install it in your browser so that your proxy could sign on behalf of every website. I'm not sure if you'd find this as distasteful as I do, but it would be required for this type of proxy (unless you are just viewing raw HTTP sites I guess.)
(no subject)
Date: 2018-03-14 04:32 pm (UTC)A bookmarklet or browser extension would be the way to go to make sure that you're capturing what you're seeing.
(no subject)
Date: 2018-03-13 09:55 pm (UTC)Wait, does Zotero even support running your own server? Those gihub repositories seem all about building your own clients/connectors.
Using their server is a hard fail of my requirements.
ETA: AHAHAHAHA OMG No.
ETA2: Oh, wait: this interesting article pointed me at https://github.com/zotero/dataserver So apparently maybe you can run your own Zotero server, but you have to edit your clients and compile them yourself, because the client software doesn't have any way to point at a non-Zotero.org zotero server.
Or one can apparently do what that blogger does, and carefully copy SQLite files around the internet.
(no subject)
Date: 2018-03-13 11:08 pm (UTC)You shouldn't have to recompile the client; there's a config editor (standalone Zotero is built on XULRunner, so this is basically about:config) that has slots for entering URL, username, and password for sync, and it looks like I was using WebDAV to my own server once upon a time.
(no subject)
Date: 2018-03-13 11:19 pm (UTC)(no subject)
Date: 2018-03-13 11:36 pm (UTC)(no subject)
Date: 2018-03-14 04:41 am (UTC)