What is SpeechKit?

SpeechKit is a service that provides news publishers access to the latest Natural Language Processing (NLP) and Text-To-Speech (TTS) technology. The service democratises these technologies by providing them as a platform and product integrations that are ready for newsrooms. Fitting in with existing workflows allows for the best text-to-speech processing with minimal hassle. 

Publishers can use SpeechKit to instantly create audio versions of their news articles. Content is cleaned and optimised by NewsNet, our NLP secret sauce, which ensures we provide the clearest voices possible. Punctuation, tone and intonation are all perfected to deliver the listener the best audio experience, wherever they are. 

SpeechKit allows publishers to liberate news from the page by opening up new opportunities for their readers to engage and consume their stories. An option to listen to articles breaks conventions and provides flexibility to a reader to wander the web and satisfy their interests. In a world of multitasking efficiency, listening has become a skill acquired for modern life. 

We develop, test and optimise to keep our service at the cutting edge providing new opportunities for digital audio news publishing. Whether it be a global news brand or a hyperlocal independent, SpeechKit delivers a precise yet versatile service that provides value to any publisher with a story to tell. 

Why do we do it?

We believe there is a disconnect between information and society. News is everywhere and nowhere. Traditional channels have become targets of prejudice and conspiracy. Trust has been broken and news has suffered by association. We provide a tool that develops an opportunity for connection between news and society. 

By extracting news from the platforms, from distraction and from the daily grind, audio empowers readers — providing them with the freedom to explore. New technologies facilitate accessible news personalisation providing a reliable source of rich information tailored to the reader.

News on-the-move is a modern solution to an age-old problem. It makes keeping up-to-date in these days of continuous development a pleasure rather than a task. It creates new opportunities for relationships with audiences and satisfies a new demand for accessibility. 

How do we do it?

SpeechKit provides newsrooms with the infrastructure to take the best advantage of state-of-the-art speech technology. Following half a decade of R&D, we designed an audio service that would work for publishers. By listening and focusing on solving audio’s shortfalls we started from the ground up building a complete service for creation, distribution and monetisation. 

NewsNet, our intelligent audio optimisation engine, brings together a suite of technologies fine-tuned to deliver a seamless audio experience. Natural Language Processing humanises the mechanical aspects of speech generated by artificial systems. We make it our business to calibrate these systems for news reading and by recreating the subtle aspects of human speech, SpeechKit provides new opportunities for connection with the reader.

Behind the scenes lies a framework that ensures these tools are delivered with maximum efficiency and reliability. By building versatile ingestion endpoints that fit modern publishing practices we make speech available to every article, story or update. We optimise audio opportunity by distributing to popular audio channels whilst providing new opportunities for engagement within a publisher’s existing digital presence, tracking every interaction for feedback and consideration.

Finally, SpeechKit tackles audio’s commercial dilemma. An efficient system minimises costs of production whilst opportunities for sponsorship are carefully integrated into the listening experience. Audio should no longer need to be seen as a nice-to-have but rather a profitable and efficient method to increase return on valuable content efforts — helping voice truly become a modern media. 

New Product Test

Of him that brought me up, not to be fondly addicted to either ofthe two great factions of the coursers in the circus, called Prasini, and Veneti: nor in the amphitheatre partially to favour any of the gladiators, or fencers, as either the Parmularii, or the Secutores. Moreover, to endure labour; nor to need many things; when I have anything to do, to do it myself rather than by others; not to meddle with many businesses; and not easily to admit of any slander.

Google launches an improved speech-to-text service for developers

Only a few weeks after launching a major overhaul of its Cloud Text-to-Speech API, Google today also announced an update to that service’s Speech-to-Text voice recognition service. The new and improved Cloud Speech-to-Text API promises significantly improved voice recognition performance. The new API promises a reduction in word errors around 54 percent across all of Google’s  tests, but in some areas the results are actually far better than that.

Part of this improvement is a major new feature in the Speech-to-Text API that now allows developers to select between different machine learning models based on this use case. The new API currently offers four of these models. There is one for short queries and voice commands, for example, as well as one for understanding audio from phone calls and another one for handling audio from videos. The fourth model is the new default, which Google recommends for all other scenarios.

In addition to these new speech recognition models, Google is also updating the service with a new punctuation model. As the Google team admits, its transcriptions have long suffered from rather unorthodox punctuation. Punctuating transcribed speech is notoriously hard though (just ask anybody who has ever tried to transcribe a speech by the current U.S. president…). Google promises that its new model results in far more readable transcriptions that feature fewer run-on sentences and more commas, periods and question marks.

With this update, Google now also lets developers tag their transcribed audio or video with some basic metadata. There is no immediate benefit to the developer here, but Google says that it will use the aggregate information from all of its users to decide on which new features to prioritize next.

Google is making a small change to how it charges for this service. Like before, audio transcripts cost $0.006 per 15 seconds. The video model will cost twice as much, though, at $0.012 per 15 seconds, though until May 31, using this new model will also cost $0.006 per 15 seconds.

SpeechKit Communication

This is a document to use to help communicate the SpeechKit mission, brand and product. It is intended that this will help direct website and other marketing material. 

What is SpeechKit?

SpeechKit is a service that provides news publishers access to the latest Natural Language Processing (NLP) and Text-To-Speech (TTS) technology. The service democratises these technologies by providing them as a platform and product integrations that are ready for newsrooms. Fitting in with existing workflows allows for the best text-to-speech processing with minimal hassle. 

Publishers can use SpeechKit to instantly create audio versions of their news articles. Content is cleaned and optimised by NewsNet, our NLP secret sauce, which ensures we provide the clearest voices possible. Punctuation, tone and intonation are perfected to deliver the listener the best audio experience, wherever they are. 

SpeechKit allows publishers to liberate news from the page by opening up new opportunities for their readers to engage and consume their stories. An option to listen to articles breaks conventions and provides flexibility to a reader to wander the web and satisfy their interests. In a world of multitasking efficiency, listening has become a skill acquired for modern life. 

We develop, test and optimise to keep our service at the cutting edge providing new opportunities for digital audio news publishing. Whether it be a global news brand or a hyperlocal independent, SpeechKit provides a precise yet versatile service that provides value to any publisher with a story to tell. 

Why do we do it?

We believe there is a disconnect between information and society. News is everywhere and nowhere. Traditional channels have become targets of prejudice and conspiracy. Trust has been broken and news has suffered by association. We provide a tool that develops an opportunity for connection between news and society. 

By extracting news from the platforms, from distraction and from the daily grind, audio empowers readers — providing them with the freedom to explore. New technologies facilitate accessible news personalisation providing a reliable source of rich information tailored to the reader.

News on-the-move is a modern solution to an age-old problem. It makes keeping up-to-date in these days of continuous development a pleasure rather than a task. It creates new opportunities for relationships with audiences and satisfies a new demand for accessibility. 

How do we do it?

SpeechKit provides newsrooms with the infrastructure to take the best advantage of state-of-the-art speech technology. Following half a decade of R&D, we designed an audio service that would work for publishers. By listening and focusing on solving audio’s shortfalls we started from the ground up building a complete service for creation, distribution and monetisation. 

NewsNet, our intelligent audio optimisation engine, brings together a suite of technologies fine-tuned to deliver a seamless audio experience. Natural Language Processing humanises the mechanical aspects of speech generated by artificial systems. We make it our business to calibrate these systems for news reading and by recreating the subtle aspects of human speech, SpeechKit provides new opportunities for connection with the reader.

Behind the scenes lies a framework that ensures these tools are delivered with maximum efficiency and reliability. By building versatile ingestion endpoints that fit modern publishing practices we make speech available to every article, story or update. We optimise audio opportunity by distributing to popular audio channels whilst providing new opportunities for engagement within a publisher’s existing digital presence, tracking every interaction for feedback and consideration.

Finally, SpeechKit tackles audio’s commercial dilemma. An efficient system minimises costs of production whilst opportunities sponsorship are carefully integrated helping voice truly become a modern media — profitable and self-sufficient. 

Medium’s metric that matters: Total Time Reading

One million page views!
50,000 signups!
Five million posts!
165 million active users!

Web companies like metrics — especially when big numbers can be used to woo the tech media into writing about us.

Away from the publicity glare of the Valley tech blogs, every web company should have some not-so-bullshit metrics that guide the business and provide an indication of its health. Ideally, there is one number to rule them all. Josh Elman calls this The Only Metric That Matters.

At Medium, our number is Total Time Reading, or TTR.


The Only Metric That Matters

Let’s first take a step back. Why have a number at all? And if you accept that numbers are a good way by which to measure the success of a business, why have only one?

Away from internet-based companies, most businesses measure their success in dollars. But the media industry has always been a little different. Typically, advertisers pay based on the size of an audience. Various techniques have been used to measure audience size: Radio used diaries, in which listeners would write down what they listened to, and when. Print media added up the total number of copies that were distributed or sold, and then made a guess at how many people saw each copy.

When the web took hold (and e-commerce was just a glint in its eye), only events — like page views and, later, clicks—could be measured. With the widespread use of cookies (and Google Analytics), we progressed to talking about users. For non-revenue-generating start-ups, users were the only currency: registered users, sign-ups, and finally active users.

“Big data” has brought with it the luxury of being able to measure any (and every) interaction that a user has with an application. We can record what a user does, with what device, when, and for how long. The data is cheap to store and relatively easy to process.

We’ve crossed a point at which the availability of data has exceeded what’s required for quality metrics. Most data scientists that I meet tell me that they’re gathering way more data than they can ever hope to use. And yet, in many cases, they still don’t have useful metrics.

Businesses (those with revenue models) are still optimizing for money. Today’s wealth of data helps to better understand what is driving their revenue. Data analysts can join the dots between the earliest user interactions (like marketing campaigns, referral sources, etc.) and end-of-funnel activities (such as spending money or clicking an ad). The data can also provide insight into product diversification or potential new revenue streams.

Companies that don’t have revenue still need to optimize for user behavior that is still valuable. In Medium’s case, that valuable behavior is engaging our users on our platform.

Engagement

Engagement has been the buzzword of growth marketers for a couple of years. When a user engages with your platform, you have their attention. And attention is the precious commodity of the super-connected era.

I think of competing for users’ attention as a zero-sum game. Thanks to hardware innovation, there is barely a moment left in the waking day that hasn’t been claimed by (in no particular order) books, social networks, TV, and games. It’s amazing that we have time for our jobs and families.

There’s no shortage of hand-wringing around what exactly “engagement” means and how it might be measuredif it can be at all. Of course, it depends on the platform, and how you expect your users to spend their time on it.

For content websites (e.g., the New York Times), you want people to read. And then come back, to read more.

A matchmaking service (e.g., OkCupid) attempts to match partners. The number of successful matches should give you a pretty good sense of the health of the business.

What about a site that combines both of these ideas? I sometimes characterize Medium as content matchmaking: we want people to write, and others to read, great posts. It’s two-sided: one can’t exist without the other. What is the core activity that connects the two sides? It’s reading. Readers don’t just view a page, or click an ad. They read.

At Medium, we optimize for the time that people spend reading.


Measuring reading time

TechCrunch’s Gregory Ferenstein wrote:

In fairness to news editors, we do know how much time readers spend on an article: We know that less than 60 percent will read more than half of an article, and a significant slice won’t read anything at all.

I think this is optimistic. It is true that Chartbeat’s analytics will tell you how deeply users engage with content. By their data, on average fewer than 60 percent of users read more than half an article. We see it differently: for us, there are no average users, and there are no average posts.

We measure every user interaction with every post. Most of this is done by periodically recording scroll positions. We pipe this data into our data warehouse, where offline processing aggregates the time spent reading (or our best guess of it): we infer when a reader started reading, when they paused, and when they stopped altogether. The methodology allows us to correct for periods of inactivity (such as having a post open in a different tab, walking the dog, or checking your phone).

The aggregate Total Time Reading (TTR) is a metric that helps us understand how the Medium platform is doing as a whole. We can slice that number in lots of ways (logged-in vs. logged-out, new posts vs. old, etc.).

We’re thinking about other ways in which this data can be used to learn about Medium users — and their interactions with specific posts. For example:

  • How can we motivate users to increase the total time spent reading the posts that they’ve written?
  • We measure the length of posts in Expected Reading Time. So, which is better: a user spending three minutes reading half of a six-minute post, or a user spending two minutes reading a two-minute post?
  • If a user spends four minutes reading a six-minute post, did she skim it? Is she just a super-fast reader? Or is our time estimate wrong?
  • How long does it take the eye to register an image?
  • What’s the optimal length of a post if we want to maximize TTR?

And so many more.

Maintaining perspective in a startup

The high startup failure rate and the increasing popularity of startup roles means that young people entering the workforce are perhaps more likely to experience redundancy than previous generations.

This, so logic would have it, will be a traumatic experience that comes totally out of the blue. But should that really be the case? After all, the only startups that really go on to become the next Facebook or Google are Google and Facebook.

Writing on the wall

A few months back, I was made redundant from a startup (not the one in my profile tagline) that I had been working with on-and-off for the past two years. The company was pivoting its strategy towards (what will hopefully be) greener pastures and my entire team was laid off as a consequence. Seeing two years of work amount ultimately to nothing more than audience-building for the new product launch was a predictably disappointing experience.

On balance, the writing had been on the wall for a while — not least because the entire company was aware that we were shifting our business model and strategy. Funding can only stretch so far and we were planning to rebuild from the ground up; inevitably, heads had to roll.

Unlike my colleagues, I was in the unique position of only working part-time while I finished my studies in advance of joining a law firm. Accordingly, beyond the immediate disappointment, I was not plunged into the same insecurity as my peers. Above all, I did not have to go through the rigmarole of applying for new positions while worrying about how I was going to make next month’s rent.

The paradox

Herein seems to lie the unspoken paradox of working in a startup: you typically work long hours and accept low pay in order to scale a business that you probably do not have any meaningful equity in.

There are, of course, tremendous upsides too. In my two years at the company, I reckon I learned more than I would have during two years of a business or management degree — not least because the various vicissitudes of the business moved faster than a university syllabus ever could.

As clichéd as it sounds, when I joined the business there were just three of us in an office basement. I took a personal hiatus to intern at an investment bank for three months before rejoining the company to find that we had moved into a new office, tripled our headcount, and were on the verge of closing our first major funding round. I assisted preparations for our first investor show, did my best to source new talent as we grew, and generally chipped in whenever I was needed.

Regardless, we all worked veryhard — though it must be said, none more than our CEO/founder — despite many of us knowing that we could be making more elsewhere. My own team even helped to formulate our pivot, which, in the end, amounted to signing our own proverbial death warrants, before seemingly proceeding to forget what we had just done.

I think that most people who work in very young startups are eager to pitch in above and beyond their pay-grade, not because they are desperate for a promotion but merely because there is a very clear sense that the company they work for ultimately consists of the people around them. (Of course, the veracity of this belief ultimately boils down to whether one considers a company to be its employees, its shareholders, or a combination of both.)

In this environment, it’s easy to occasionally lose perspective as your own personal goals become intertwined with your employer’s.

Maintaining perspective

Ultimately, a job is a job — unless you own tangible equity in the startup you work for, this is a fact worth remembering. Business is risky, and none are more so than startups, particularly when even the faintest sliver of profit is beyond the horizon. Maintaining perspective is therefore paramount, since your job can be snuffed out by a change of strategy or a dearth of cash.

Working in a startup does often involve squaring the paradox of acceptingboth long hours and (often) lower pay with what the numbers do not show: one hell of an experience, an insane learning curve, and the chance to actually build something. In my brief experience, at least, those stereotypes undeniably held true.

As such, I have just three words for my colleagues who remain: best of luck.

New technology: bad for radio?

Automation Killed The Radio Star, says the latest blog from Dick Taylor, a US radio writer.

Two things about this.

The first is the use of a lazy Buggles headline. Radio is still very much alive, with 9 out of 10 people in most large countries listening every week. Nothing has killed anything.

I collect lazy Buggles headlines. The song was, of course, the first song to be played by MTV, back in the days when it played music instead of vapid reality television shows. Amusingly, radio outlasted MTV.

Every time we repeat a “killed the radio star” headline, we reinforce the thought that radio is, in some way, in trouble. It isn’t. For parts of the US population, radio is more popular than television!

The other part of Dick’s blog post that I disagree with is the finger-pointing at technology — in this case, automation.

It takes people to use, or misuse, any form of technology. Technology, by itself, isn’t capable of being good or bad.

The postal service is not a bad thing, just because occasionally people send bad things through it, after all.

Automation is capable of getting the best out of your programming. It’s capable of a warm friendly voice overnight, instead of a tone or piped-in programming from the other side of the world.

Automation is capable of polish and tweaks that were impossible in the age of cart machines and turntables.

Poor automation is poor radio, granted — but we’d be foolish to claim that all automation is poor.

New technology, used well, has the potential of delighting our audience, and out of that, bringing ratings and revenue. Used badly, it can have the opposite effect.

But, as is hopefully relatively clear, I’m a fan of what new technology can bring to radio. Including automation.

If anything killed the radio star, it’s the humans who used automation badly. Perhaps radio needs less of those types of humans.

Bloomberg Media is using text-to-audio to keep app users engaged

Bloomberg Media in May introduced a text-to-audio function in its app and online with the hunch that commuters would prefer to multitask while getting their news.

According to Julia Beizer, global chief product officer, adoption started off slow, particularly on mobile web, and shortly after launch, people were listening to two and a half stories on average per app session. Now, this has increased to six stories and has become the second-most popular media type on the app (behind live TV).

“Audio is particularly interesting for our audience because of that multitasking utility, that is a real news use case,” said Beizer. “The delivery of journalism is changing to meet this moment, audio for a multitasking audience a huge tool in our toolkit.”

Publishers like the Financial Times, which has a similar audience segment of global business decision makers, has been converting text to audio articles since last year and is seeing that people are coming back regularly to listen.

Audio fits into the product team’s wider goals of driving utility for the Bloomberg audience, particularly a younger audience. According to Beizer, the Bloomberg audience age is varied, skewing younger than expected in areas. For the Markets area of the site, for instance, 48 percent of the audience is under 35 years old.

Studies show that podcast listeners tend to be younger: Research from U.K. radio trade body Radio Joint Audience Research in March found that two-thirds of new podcast listeners are aged between 16 and 35. And new users are growing: 21 percent of podcast listeners have started listening in the last six months.

Bloomberg has taken advantage of the renaissance in podcasting. The company said that audience downloads for its some 25 podcast have increased 35 percent year over year, but was unwilling to give exact numbers.

Bloomberg broadcasts a number of different podcast formats. One of the most recently launched, TicToc, an extension of news network on Twitter, details daily news. This summer Bloomberg ran its first mini-series podcast with The Pay Check, a six-episode series looking at the gender pay gap through sociological, financial and personal lenses. Since its launch in May, the podcast has had 200,000 downloads. The success of this, said Beizer, is encouraging Bloomberg to create more mini-series this year, including one on the new economy covering the challenges facing the world economy, and one on navigating the productivity industry.

Bloomberg was early in on developing skills for Amazon audio focused Echo devices and has two to three people who work getting content like its Market Minute on other smart speakers like the Apple HomePod, Echo and Home.

But the scale isn’t there on smart speakers for Bloomberg to create platform-specific content. Bloomberg’s Twitter show, TicToc, is distributed on Amazon’s Echo Show, which features a screen, is performing well, according to Beizer, because both social-created content and Echo Show content are experienced with the sound off.

“The rise of smart speakers is particularly remarkable in an era when every one of us would rather get a text message than talk on the phone,” she said.

Image: courtesy of Bloomberg Media. 

Corporate Japan despairs at UK’s lack of clarity over Brexit

Head of powerful business lobby warns a no-deal exit would be ‘disastrous’.

Japanese companies are increasingly frustrated by the double talk from the British government over Brexit and are hamstrung on how to respond, according to the head of Japan’s most powerful business lobby.

“We just can’t do anything. Everyone is seriously concerned,” said Hiroaki Nakanishi, chairman of Keidanren, in an interview with the Financial Times. “Various scenarios get discussed, from no Brexit to plunging into Brexit without any kind of deal at all. We’re now in a situation where we have to consider what to do in all of them,” he said.

His comments highlight the sense of despair among Britain’s biggest foreign employers after waiting more than two years for clarity about what Brexit will mean.

Keidanren represents more than a thousand of Japan’s biggest companies including large investors in the UK such as Toyota, Honda and Nissan. Mr Nakanishi, who took over as chairman in May, also chairs Hitachi and is one of the country’s best-known industrialists.

Mr Nakanishi said a no-deal Brexit would be disastrous and urged Britain to stay in the customs union.