More

orcs · on Nov 16, 2018

To worry less.

orcs · on Nov 15, 2018

I'm 2/3s of a way through a CS degree and I cant do them either.

orcs · on Oct 30, 2018

Wish I was clever enough to figure this stuff out.

voltagex_ · on Oct 31, 2018

Start by running a packet capture on your home network - https://www.wireshark.org/

SmellyGeekBoy · on Oct 31, 2018

You'll never get there by wishing. It's not a difficult skillset to learn.

orcs · on Oct 28, 2018

I read the title and immediately thought: 'duh!'

orcs · on Oct 26, 2018

Netflix, Amazon prime.

orcs · on Oct 23, 2018

Iron Bru and Inca Cola whilst being orange and yellow respectively actually tasted pretty much the same.

orcs · on Oct 15, 2018

Sounds like autohotkey.

orcs · on Oct 14, 2018

My takeaway from this, since the majority of it went over my head: I can continue to parse HTML with regular expressions.

TheDong · on Oct 14, 2018

I disagree with that takeaway.

Technically PCRE regexes are powerful and can match anything.

In reality, a complex PCRE regex will almost always be more difficult to maintain than a parser-combinator or hand-rolled parser in a more traditional language.

People saying "Regexes can't match HTML, use an html library" are wrong to say regexes are incapable of it, but they're right to say to use a library meant for the job.

Sure, you can use this regex to match emails (http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html), but using a more normal parsing language than PCRE regex will result in more readable code.

The same is true for almost any regular expression that takes advantage of PCRE features, especially backreferences.

In addition, a regexp will only match html correctly if you write a very complex one. With a naive regexp for an html tag's contents, you'll find that you still might match that text inside a <script> tag even though that is not html.. so you now need to figure out when you're in a script tag and exclude that, or if you're inside an html attribute string, and before you know it you have a 2000 character regexp that no one else will be able to read, all because you didn't want to use an html parsing library where getting a tag's value correctly would be a single xpath expression or css selector away.

pierrec · on Oct 14, 2018

There's no arguing the fact that regexes are a poor fit for HTML, but maybe this is the wrong time to use that ridiculous email regex as an example, since TFA features a highly readable, fully compliant email matching regex as its main example.

jandrese · on Oct 15, 2018

It also doesn't point out that matching email addresses in general is a nightmare because the standard is one of those "we'll just allow everything everybody is doing right now" type standards that have a million different little quibbles.

No matter what language or programming style you use it's going to be ugly because it's an ugly problem.

dfox · on Oct 14, 2018

What the language contains as a main example is PCRE pattern that matches email addresses and in comparison to it's original EBNF incarnation is highly unreadable due all the syntax noise required to graft backtracing support onto traditional regex syntax (not to mention the fact that it's performance is certainly highly suboptimal).

TheDong · on Oct 14, 2018

The performance might be better in some cases actually.

In the case of perl, where regexps performance has been optimized for significantly, the regexp actually performs better than a more normal parser.

From http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html:

> It provides the same functionality as RFC::RFC822::Address, but uses Perl regular expressions rather that the Parse::RecDescent parser. This means that the module is much faster to load as it does not need to compile the grammar on startup.

Of course, if perl were a statically compiled language, the cost of compiling the grammar could be done at compile time.

dfox · on Oct 14, 2018

Perl5 is not too meaningful for such performance comparisons, because on one hand it's regex implementation is very optimized while on other hand performance of Perl5 on "normal" procedural code is horrible (eg. Perl5 is about an order of magnitude slower on Gabriel's Takeuchi function benchmark than CPython).

nicoburns · on Oct 14, 2018

This result generalises to most interpreted languages though. PHP, python, javascript, etc all have highly optimised regex engines, and regexs can consequently be a good optimisation technique when using those languages.

beanofsoy · on Oct 14, 2018

Using regex to match email addresses isn't actually a good idea either.

https://blog.onyxbits.de/validating-email-addresses-with-a-r...

justinator · on Oct 14, 2018

> People saying "Regexes can't match HTML, use an html library" are wrong to say regexes are incapable of it, but they're right to say to use a library meant for the job.

Plot twist: the html library is built upon regex's (at least in part).

marcosdumay · on Oct 15, 2018

Every parser is partially built upon regexes. You have to go all the way until Haskell, Prolog or such languages before you get better options than regexes to build them.

But they are not built solely of regexes. They always have added control structures that complement regexes on those place they are worst.

arkh · on Oct 14, 2018

> Sure, you can use this regex to match emails

Or you can use comments and named groups: https://stackoverflow.com/a/1917982

legulere · on Oct 14, 2018

Even real regular regexes can be used when nesting is limited, which is true for most real-world html, xml, json. Still you're better off using libraries.

kbenson · on Oct 14, 2018

The words of the statement matter in specificity. You can parse HTML with a powerful regular expression, but it's not a good tool for the job. That said, I find it a wonderful tool to extract specific portions of an HTML document.

If you actually just care about retrieving a few specific bits of data within a page, I've found parsing libraries (including ones that allow for CSS selectors) to be just as brittle to changes as regular expression extraction, and not all that much easier to use, given a good grasp of both technologies.

That said, if you need to alter an HTML document in some non-trivial way, parsing is probably the way to go.

jandrese · on Oct 15, 2018

We had two versions of a particular app once. One used BeautifulSoup to parse the page and pull out the relevant elements. The other used some crusty old Regex patterns. At the end of the day the Regex version required about half of the maintenance the tag soup version did. IMHO the difference was that it took some of the content into consideration unlike the tag only version that was more sensitive to otherwise invisible changes under the hood.

Vindicis · on Oct 15, 2018

Not to mention the sheer difference in performance between the two. I've found regexes to be magnitudes faster than parsers, for extracting data, that is.

l_t · on Oct 14, 2018

The author's stated stance is that you can, but usually shouldn't.

Personally, I think to myself before using regexes to parse HTML:

1. Am extracting more than the contents of a single element?

2. Is the input HTML prone to change?

3. Will there be issues if the parsing completely fails?

4. Do I plan to use this code for more than a few months?

If the answer to any of the questions is "Yes", then don't use a regex :)

gbacon · on Oct 15, 2018

See "Oh Yes You Can Use Regexes to Parse HTML!"[0] on Stack Overflow by Tom Christiansen.

[0]: https://stackoverflow.com/a/4234491/123109

crshults · on Oct 14, 2018

Have you tried using an XML parser instead?

marcosdumay · on Oct 15, 2018

HTML is not valid XML.

Some parsers may survive, but it's different enough to break most of them.

orcs · on Oct 14, 2018

That's what I thought. Just like making sure the phone doesn't connect to the internet so it can't be remotely wiped. I don't own an iPhone so I might be wrong but can't the cop just tape over the camera/sensor?

janekm · on Oct 14, 2018

Most likely they would place it in a shielding bag to cover both issues.

orcs · on Oct 13, 2018

Just out of interest because I've never thought of this before. Would this protect the content from the heat of a good going fire?

dangerface · on Oct 15, 2018

They are supposed to and apparently work well, they do nothing to stop a thief so the fire safety is all they are good for.

https://www.youtube.com/watch?v=2guvwQvElA8

toomuchtodo · on Oct 13, 2018

Depends on the fire safe rating (which is classified by temperature and time).

HN For You