(Bloomberg) -- SoundHound Inc., known for its music recognition app, raised $75 million to compete with the likes of Amazon.com Inc. and Google to build artificial intelligence that helps machines understand human voices.

The 12-year-old startup is betting that as more everyday devices get connected to the internet, using speech to control and direct them will become the dominant form of interaction. The company aims to encourage device makers to use voice AI tools offered by SoundHound rather than try to build their own.

Santa Clara, California-based SoundHound is one of only a few companies that has built from scratch a core AI technology that can identify and interpret audio. Most of the others that have their own speech-recognition engines are big names, including Apple Inc., Baidu Inc., and Microsoft Corp. And many of them tightly control how the software can be used and the data that’s generated, said SoundHound Chief Executive Officer Keyvan Mohajer.

“We don’t have an agenda to hijack your product,” Mohajer said. “If you use Amazon, you lose your brand, your users. You have to ask your user to log into their Amazon account, they have to call on Alexa, and all the data belongs to them.” Meanwhile, when customers build voice-enabled devices or apps using SoundHound’s technology, the startup doesn’t own the users or the data, he said.

The fresh capital will help SoundHound add more customers using its speech AI platform, Houndify, at a faster rate and expand its operations to Asia and Europe. Investors in this round of fundraising included Samsung Electronics Co.’s Catalyst Fund and graphics chip-maker Nvidia Corp. -- both of which build hardware technology integral to artificial intelligence and connected devices, the Internet of Things, and already work with SoundHound. Nomura Group, Kleiner, Perkins, Caufield & Byers, and the SharesPost 100 Fund were also among new investors. The company declined to discuss its valuation. Research firm Pitchbook Inc. estimates SoundHound’s valuation to be around $800 million.

Already, SoundHound has integrated its software with Samsung hardware, making it easier for developers to add voice-enabled technology to devices that use the electronics giant’s chips. The Korean company staked a claim in building the future connected-device industry, announcing a plan last year to invest $1.2 billion in the U.S. in IoT projects over the next four years. SoundHound also has worked with Nvidia to combine speech recognition with the chipmaker’s auto infotainment systems. Mohajer expects to work on more projects with his new investors, though declined to provide details.

The end goal for all of these companies is to sell technology and devices that allow people to talk about and ask for anything and have the systems understand and be able to respond accordingly. Part of that challenge is what SoundHound and its rivals in speech are working on: AI that knows the difference between pizza, the food, and Pisa, the town famous for its leaning tower.

This requires the software to have contextual knowledge of the differences. Many companies with data in domains from travel to weather to food have given access to SoundHound’s AI, so that the speech recognition can simultaneously tap into all these different data sources for knowledge on what the person is saying -- what SoundHound is calling its Collective AI. Being able to get access to information from companies like Uber and Yelp Inc. means developers using SoundHound’s software to build voice-enabled products are better at understanding what the user is saying, Mohajer said.

SoundHound has also taken a different approach to building its speech AI, where the technology will in real-time identify the words and work on deciphering the context, which Mohajer said provides faster results. Most other speech and language interpretation technologies take a piecemeal approach where the software figures out the words from the audio and then deciphers the meaning.

Based on the description of the technology, it’s likely that SoundHound’s approach uses incremental recognition, where the software doesn’t wait until the user stops talking to try and interpret the words, said Alexander Rudnicky, a research professor at Carnegie Mellon University, though he added that he can’t be sure of exactly how SoundHound’s AI works.

“If you want to give people something as fast as possible, then the right idea is to do this incremental approach,” he said. But optimizing for speed in this way may result in a speech and language interpretation system that struggles with certain use cases, he said.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access