thacker's comments

thacker · on May 7, 2010

We're in the process of back-converting existing documents to HTML5. Most of your recent public documents should be converted now or soon, with private docs to come later.

thacker · on May 6, 2010

Try the fullscreen button -- is that what you're looking for?

axod · on May 6, 2010

Thanks yes.

Also the "readcast" popup is pretty annoying IMHO, popping up each time, obscuring the document.

thacker · on May 7, 2010

If you have a Scribd account, visit http://www.scribd.com/account/edit#sharing and turn off all the Readcast options. The popup will be suppressed.

axod · on May 7, 2010

I don't have an account. I don't understand why I want an account yet :) The value proposition of what Scribd solves, is still puzzling to me... But thanks for the info.

yellowbkpk · on May 7, 2010

FYI: Whenever I see "readcast" I read it as "Re-Ad-Cast" not "Read-Cast" as I'm sure you mean it. With the color on the page it's a different story, though.

qhoxie · on May 7, 2010

Can't reply to you directly axod, so I'll put it here. To better support fullscreen viewing and other setups, we still have some behaviors to figure out with those pop-out dialogs.

thacker · on May 6, 2010

With regards to default viewmodes, that's working for iPaper and is coming soon for the HTML5 viewer.

Go ahead and upload documents in any formats supported by Scribd ... they'll be added to our conversion queue and appear as HTML 5 documents when ready.

thacker · on May 6, 2010

All documents uploaded to Scribd (either automatically or manually) are stored in S3 and are available on Scribd until the document is deleted from Scribd. If you find your content was uploaded to Scribd without your permission, you should contact our support team; they'll take care of it for you.

ErrantX · on May 6, 2010

Does it appear in document lists? Or is knowing the URL the only way in?

The "ethical" concern I have is that it encourages behaviour like that shown on hn - with the scrape link added automatically. There is a distinct difference there compared to people uploading documents themselves (IMO anyway).

Of course if it's stored but not indexed that feels fine.

(thanks for clarifying)

thacker · on May 6, 2010

HN is using the "slurp" API we make available at scribd.com/developers. Documents uploaded via the slurping system are marked as private and uploaded to a fixed "slurp" account for which no one (outside of Scribd) has the password.

thacker · on May 5, 2010

Google rasterizes the PDF and streams it to you as an image. Scribd will be converting documents to HTML and CSS while maintaining a near perfect facsimile of the original document.

axod · on May 5, 2010

Google does a lot more than that. For example copy and paste works if you select some text and copy it out. That's non trivial.

jamesjyu · on May 5, 2010

That is true, their conversion understands text regions and various other things. However, what makes Scribd's viewer more sophisticated is that it will actually use structured HTML to render the document content. This is more than just putting on a layer that specifies regions in the document, it will actually just be a normal HTML document, made of divs, text, images, etc.

Plus, it will maintain the fidelity of the document -- meaning that even PDFs with complicated layouts will be rendered properly in HTML. No trivial task.

axod · on May 5, 2010

What will be the main advantage?

It's a great technical challenge, but will users notice the difference?

hazzen · on May 5, 2010

Users will be able to easily view PDFs on the web from any device. I primarily read HN on my phone. Any time I see a [Scribd] link, I forget about the story because I can't read it. I have also had several occasions where I needed to read a PDF on the go. I had to email it to a friend and then call them, dictating the pertinent bits over the phone.

Once this is live, Scribd will gain at least myself as a user, and I suspect many more.

(Nesting is too deep to reply to axod, so: the fact that Google is doing this should be reason enough. Scribd exists as a place to publish material. That material should be reachable by as many users as they can manage.)

axod · on May 5, 2010

http://docs.google.com/gview?url=http://infolab.stanford.edu...

Works fine for me :/

warfangle · on May 6, 2010

Although the semantic web is still pie-in-sky-land, having your content structured instead of a big block o' text is always better.

Especially if your users are disabled.

abstractbill · on May 5, 2010

This is a little indirect, but presumably the googlebot will find it easier to index scribd pages, which might help you as a user if you're searching for something that's hosted on scribd.

axod · on May 5, 2010

True, although .pdf files are already indexed. Do we also need to have html5 versions of those same pdfs indexed? I'm not sure we do.

mikecane · on May 5, 2010

The Google PDF has an OCRed text component to it, Google Books is a real bear to use on the iPhone. On the iPad, depending on the book, it can be acceptable (when compared to the pain of trying it on the iPhone!).

I don't know how Scribd is going to carry this off, what with people sometimes uploading outright scans of books. I mean, Scribd is not Scribd without the stuff put there by people -- like with Youtube.

thacker · on May 5, 2010

Just in case anyone's wondering -- it's not just converting each page to an image. It's all HTML5 text, graphics, and images where appropriate.

wmf · on May 5, 2010

Is there anything specific to HTML 5 there?

thacker · on May 5, 2010

The new viewer doesn't use the full spectrum of HTML 5 features, to maintain compatibility with older browsers, but it would not be possible before HTML 5.

zmmmmm · on May 6, 2010

Can anybody name a single feature exclusive to HTML5 that it is actually using? TFA says:

"Friedman estimates that 97 percent of browsers will be able to read Scribd’s HTML5 documents"

That pretty much counts IE6 into the picture, so I'm really wondering exactly what "HTML5" features IE6 supports!

bphogan · on May 7, 2010

Using the HTML5 doctype lets you use HTML5 tags and custom data attributes and have a valid document. New HTML5 form fields, custom attributes, and markup elements are usable in IE6 mainly because it just doesn't really bother to explode when it encounters them. Form fields just show up as text boxes, custom data attributes are only used in JS anyway, and new structural elements are usable and styleable in IE6 just by adding JS that does a document.createElement().

HTML5 isn't something that just came around. It's been in the works by browser makers for quite a while, which is refreshing. Rather than it being a spec made up in a purely academic environment (XHTML 2), it's something that's made up of technologies that have already been used by one or more browser makers (and often, developers on real sites.)

Also, using the HTML5 doctype in IE6 causes IE6 to go into standards mode, which is just pure luck.

You can do a lot of good for users if you start using some of the HTML5 features right now, even if it's not apparent. If you use the type="email" for your forms when you ask for an email, the ipod and ipad will bring up the Email keyboard layout. That alone is kinda cool.

axod · on May 5, 2010

Why? What do you need from HTML5 to render a static document?? The AUDIO tag? VIDEO? websockets?

matthiaskramm · on May 5, 2010

custom fonts, for starters :)

statictype · on May 5, 2010

Anything else besides custom fonts? Unless there's a subset of html5 features that I'm completely unaware of, I don't see how html5 brings anything useful to a text viewing app like scribd that wasn't already possible before...

pedalpete · on May 5, 2010

text rotation, shadow or indenting, etc. Lots of good 'print' looking stuff that was done with images or in flash before. Now Scribd gets to do it in straight html making it easier to index as well (I know flash in indexable, but I'm pretty sure there is a preference to text).

axod · on May 5, 2010

True, but not convinced enough people care about custom fonts over rendering an image. It ends up being pretty much the same experience for them. (Actually custom fonts may well load slower for users, so it's a worse experience in some ways).

stanleydrew · on May 5, 2010

But Google can't crawl images...

axod · on May 5, 2010

Google already crawls the original .pdf files. I'm not sure we need every conversion of a pdf file indexed as well.

jrockway · on May 6, 2010

But how else will people find their way to Scribd's site to click the ads?

earl · on May 6, 2010

You're seriously so pissy about scribd that you're currently objecting to their using standard html instead of images? Get a life.

axod · on May 6, 2010

I'm trying to figure out why they're bothering to solve something that's already been solved pretty well.

eg:

http://docs.google.com/gview?url=http://infolab.stanford.edu...

Users won't see any difference, or care.

I do think scribd up to now has been pretty bad for the web, locking plain text documents and images up in their walled garden. Maybe they can change that, but what value can they actually add? What problem are they solving?

tyler · on May 6, 2010

And of course, you know that users won't see any difference or care, despite never having seen, let alone used it. Just because you think the problem has been solved well enough, doesn't mean everyone does. After all, who needs a refrigerator when you have an ice box?

HN For You