Language Model As A Hybrid War Weapon? | Václav Novák

What is a language model? We used to play a game with a bunch of players around a table. You start a story on a piece of paper, just the first word. Then you pass it to the next person while you receive a story from the other neighbor. You write the next word and fold the paper so that only the last word is visible. Then you continue, always trying to write a good next word based on the last one. After a whole round of passing the stories, the papers are unrolled and the stories are read. Hilarity ensues. Sometimes.

While the stories are mostly incoherent, they are often grammatically correct, as they follow a basic sentence structure. Recently, bright minds from OpenAI have surpassed humans on this task of creating gibberish stream of words that is also mostly correct English and you can use the demo of GPT-2 and give it any kind of beginning you want, it will continue with what mostly looks like sentences (my beginning in bold):

When you approach an unmarked pedestrian crossing, you must yield to pedestrians on the other side of the crosswalk. If there is no pedestrian traffic on the adjacent roadway, and you cannot see them, you must also yield to other road users such as a vehicle.

There may be several unmarked pedestrian crossings in a residential area, especially around schools.

In addition to crossing a street to get to your destination, crossing a sidewalk with the intent of turning right or left is a violation of a local by-law.

You must always have a clear view of both the road and any approaching vehicles, and you should only cross the road when the vehicle in front of you can’t safely make the turn.

How to avoid an accident

In many situations it can be difficult to see oncoming traffic and, if you are not familiar with the road and area you are crossing, it may seem like no-one is around.

Here are a few tips to avoid an

And what is a language model? It’s a device to which you put the beginning of a text, and it tells you which words are how likely to go next. Which are probable, which are improbable, and which (nearly) impossible. Once you have such a device, it’s easy to take the text, ask what the most probable next word is, append it, and repeat the process until it looks like you have enough. And that’s exactly how this machinery works, and it also explains why the result is not going to make sense in the end. When you always ask just for the next word, most of the time it really only depends on the last few. To make the text coherent, you’d have to start with a point you want to make, not completely improvise at every word.

Now people are worried this technology could be used to manufacture fake news on scale, flooding our information channels with so much noise that we couldn’t get to actual information any more. I doubt it could make a lot of difference. This is the hardest part of a disinformation campaign to automate, but it’s easy to do by humans. The fake news already have a vast automated network to like and share the posts within the network to boost their reach and credibility, and that’s where the cat and mouse game with content platforms happens. There is no shortage of fake news text, it’s the targeting and delivery where the bottleneck lies for misinformation campaigns. That’s why so far there has been no misuse of this technology as far as the authors can tell.