For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | thacker's commentsregister

We're in the process of back-converting existing documents to HTML5. Most of your recent public documents should be converted now or soon, with private docs to come later.


Try the fullscreen button -- is that what you're looking for?


Thanks yes.

Also the "readcast" popup is pretty annoying IMHO, popping up each time, obscuring the document.


If you have a Scribd account, visit http://www.scribd.com/account/edit#sharing and turn off all the Readcast options. The popup will be suppressed.


I don't have an account. I don't understand why I want an account yet :) The value proposition of what Scribd solves, is still puzzling to me... But thanks for the info.


FYI: Whenever I see "readcast" I read it as "Re-Ad-Cast" not "Read-Cast" as I'm sure you mean it. With the color on the page it's a different story, though.


Can't reply to you directly axod, so I'll put it here. To better support fullscreen viewing and other setups, we still have some behaviors to figure out with those pop-out dialogs.


With regards to default viewmodes, that's working for iPaper and is coming soon for the HTML5 viewer.

Go ahead and upload documents in any formats supported by Scribd ... they'll be added to our conversion queue and appear as HTML 5 documents when ready.


All documents uploaded to Scribd (either automatically or manually) are stored in S3 and are available on Scribd until the document is deleted from Scribd. If you find your content was uploaded to Scribd without your permission, you should contact our support team; they'll take care of it for you.


Does it appear in document lists? Or is knowing the URL the only way in?

The "ethical" concern I have is that it encourages behaviour like that shown on hn - with the scrape link added automatically. There is a distinct difference there compared to people uploading documents themselves (IMO anyway).

Of course if it's stored but not indexed that feels fine.

(thanks for clarifying)


HN is using the "slurp" API we make available at scribd.com/developers. Documents uploaded via the slurping system are marked as private and uploaded to a fixed "slurp" account for which no one (outside of Scribd) has the password.


Google rasterizes the PDF and streams it to you as an image. Scribd will be converting documents to HTML and CSS while maintaining a near perfect facsimile of the original document.


Google does a lot more than that. For example copy and paste works if you select some text and copy it out. That's non trivial.


That is true, their conversion understands text regions and various other things. However, what makes Scribd's viewer more sophisticated is that it will actually use structured HTML to render the document content. This is more than just putting on a layer that specifies regions in the document, it will actually just be a normal HTML document, made of divs, text, images, etc.

Plus, it will maintain the fidelity of the document -- meaning that even PDFs with complicated layouts will be rendered properly in HTML. No trivial task.


What will be the main advantage?

It's a great technical challenge, but will users notice the difference?


Users will be able to easily view PDFs on the web from any device. I primarily read HN on my phone. Any time I see a [Scribd] link, I forget about the story because I can't read it. I have also had several occasions where I needed to read a PDF on the go. I had to email it to a friend and then call them, dictating the pertinent bits over the phone.

Once this is live, Scribd will gain at least myself as a user, and I suspect many more.

(Nesting is too deep to reply to axod, so: the fact that Google is doing this should be reason enough. Scribd exists as a place to publish material. That material should be reachable by as many users as they can manage.)



Although the semantic web is still pie-in-sky-land, having your content structured instead of a big block o' text is always better.

Especially if your users are disabled.


This is a little indirect, but presumably the googlebot will find it easier to index scribd pages, which might help you as a user if you're searching for something that's hosted on scribd.


True, although .pdf files are already indexed. Do we also need to have html5 versions of those same pdfs indexed? I'm not sure we do.


The Google PDF has an OCRed text component to it, Google Books is a real bear to use on the iPhone. On the iPad, depending on the book, it can be acceptable (when compared to the pain of trying it on the iPhone!).

I don't know how Scribd is going to carry this off, what with people sometimes uploading outright scans of books. I mean, Scribd is not Scribd without the stuff put there by people -- like with Youtube.


Just in case anyone's wondering -- it's not just converting each page to an image. It's all HTML5 text, graphics, and images where appropriate.


Is there anything specific to HTML 5 there?


The new viewer doesn't use the full spectrum of HTML 5 features, to maintain compatibility with older browsers, but it would not be possible before HTML 5.


Can anybody name a single feature exclusive to HTML5 that it is actually using? TFA says:

"Friedman estimates that 97 percent of browsers will be able to read Scribd’s HTML5 documents"

That pretty much counts IE6 into the picture, so I'm really wondering exactly what "HTML5" features IE6 supports!


Using the HTML5 doctype lets you use HTML5 tags and custom data attributes and have a valid document. New HTML5 form fields, custom attributes, and markup elements are usable in IE6 mainly because it just doesn't really bother to explode when it encounters them. Form fields just show up as text boxes, custom data attributes are only used in JS anyway, and new structural elements are usable and styleable in IE6 just by adding JS that does a document.createElement().

HTML5 isn't something that just came around. It's been in the works by browser makers for quite a while, which is refreshing. Rather than it being a spec made up in a purely academic environment (XHTML 2), it's something that's made up of technologies that have already been used by one or more browser makers (and often, developers on real sites.)

Also, using the HTML5 doctype in IE6 causes IE6 to go into standards mode, which is just pure luck.

You can do a lot of good for users if you start using some of the HTML5 features right now, even if it's not apparent. If you use the type="email" for your forms when you ask for an email, the ipod and ipad will bring up the Email keyboard layout. That alone is kinda cool.


Why? What do you need from HTML5 to render a static document?? The AUDIO tag? VIDEO? websockets?


custom fonts, for starters :)


Anything else besides custom fonts? Unless there's a subset of html5 features that I'm completely unaware of, I don't see how html5 brings anything useful to a text viewing app like scribd that wasn't already possible before...


text rotation, shadow or indenting, etc. Lots of good 'print' looking stuff that was done with images or in flash before. Now Scribd gets to do it in straight html making it easier to index as well (I know flash in indexable, but I'm pretty sure there is a preference to text).


True, but not convinced enough people care about custom fonts over rendering an image. It ends up being pretty much the same experience for them. (Actually custom fonts may well load slower for users, so it's a worse experience in some ways).


But Google can't crawl images...


Google already crawls the original .pdf files. I'm not sure we need every conversion of a pdf file indexed as well.


But how else will people find their way to Scribd's site to click the ads?


You're seriously so pissy about scribd that you're currently objecting to their using standard html instead of images? Get a life.


I'm trying to figure out why they're bothering to solve something that's already been solved pretty well.

eg:

http://docs.google.com/gview?url=http://infolab.stanford.edu...

Users won't see any difference, or care.

I do think scribd up to now has been pretty bad for the web, locking plain text documents and images up in their walled garden. Maybe they can change that, but what value can they actually add? What problem are they solving?


And of course, you know that users won't see any difference or care, despite never having seen, let alone used it. Just because you think the problem has been solved well enough, doesn't mean everyone does. After all, who needs a refrigerator when you have an ice box?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You