Develop an API for easier backend mass downloading

Back to forum

HARK0D Posted on Jun 12, 2020 #1

Member

Posts: 8

Joined: May 5, 2020

In my previous post (https://www.wikiloops.com/forum/viewthread.php?thread_id=2458#post_18098) I already hinted at the potential benefit Wikiloops can bring to the table of AI in music. The data is there (151K tracks to be exact). Not only that, the individual tracks are well-labeled, of high quality and often consisting of a single instrument recording.

The database that is Wikiloops is ideal to tackle tasks such as music source separation or timbre transfer (i.e. transforming the sound of an instrument to make it sound like a different instrument). It is basically a yet to be discovered gold mine for music processing in my opinion. This is mainly down to the fact that the availability of good, annotated data remains one of the greatest challenges in Machine Learning. Wikiloops could be a great answer to this challenge in the field of AI in music.

The only thing that is really missing is a streamlined means of accessing this data en masse. Anyone interested in gathering tracks for Machine Learning purposes is required to perform this task manually. This can become rather tedious and labor intensive when for instance aiming to gather over 1000 songs (or greater) for training.

What I am proposing is the following: develop an API (Application Programming Interface) with which someone can enter a search query (e.g. all tracks of only acoustic guitar recordings) and receive the entire collection of tracks that correspond to this query. This would help to greatly cut down the work and time required to build a dataset from Wikiloops tracks.
Who would benefit from this? Probably none of the existing users on Wikiloops, but a new target audience instead. Namely, AI/data scientists interested in working with music. It was smart Googling along with sheer coincidence that I stumbled upon this website (I was looking for acoustic guitar recordings I could download). Not only raising awareness, but also offering the aforementioned functionality could really see a new flow of paying users to this website.

Dick Posted on Jun 14, 2020 #2

SUPPORTER

Posts: 2956

Joined: Dec 30, 2010

Hey HARk0D,
interesting post - let me try to answer your ideas.

Since a lot of people reading this thread might not be all familiar with some technical terms, I will try to translate your request and my response in a way that will enable the average wikiloops musician to anticipate what we are talking about here.

We are operating wikioops collaboratively, so any added features / services of wikiloops must be agreeable to the majority of wikiloops users, so giving some understandable info is crucial in that process.

HARK0D wrote:
It (wikiloops) is basically a yet to be discovered gold mine for music processing in my opinion. This is mainly down to the fact that the availability of good, annotated data remains one of the greatest challenges in Machine Learning. Wikiloops could be a great answer to this challenge in the field of AI in music.

I'd agree to that to some extent. Data is the new gold, some say, and idioms of speech like "data mining" are commonplace, too.
I'm afraid these wordings might spark some rather negative emotions by those who happen to own the place you want to claim stakes on and exploit (to stick to the Klondyke-speak), and who provide the data you'd like to dig thru, so let's be careful to communicate what you are after.
You are pointing your finger towards the fact that the value of wikiloops music is made up from the two components of the music recordings on one side, and the data labels attached to these recordings on the other - and I believe even the non AI-researching people on wikiloops can relate to that.
Having 1000 unlabelled jamtracks on your harddrive is much less convenient then accessing the same 1000 tracks via wikiloops, so that is the added value which is indeed rather rare to come by. Wether it is a valuable as gold... depends on the price offered, right?

HARK0D wrote:
The only thing that is really missing is a streamlined means of accessing this data en masse. Anyone interested in gathering tracks for Machine Learning purposes is required to perform this task manually. This can become rather tedious and labor intensive when for instance aiming to gather over 1000 songs (or greater) for training.

To provide such an API (an automated computer-to-computer data exchange interface, that is) would be easy to set up, technically.

It would however work very well for non-scientific content grabbing, too, and we've seen quite a bit of that in the past.
To say it boldly:
To open up a “en masse downloading”-Tool which would also export the categorisation of wikiloops would be quite the invitation to every fifteen year old wannabe hacker to open up a quick vendor website using wikiloops content, while driving up the wikiloops server cost quite notably.
Where would the benefits be on the wikiloops communities end, and who would defend us against such unauthorized uses?

Last, as a coder, I totally understand the inconvenience of having to find & download each individual track, which will take three clicks per track when starting from the “guitar only” search results page.
As a musician who has contributed to wikiloops and taken time to compose, record AND categorize my track during upload, three clicks to get where I want doesn't seem inconvenient at all, it seems rather easy compared to the time and effort I've put into offering one single track to you.
Again, I do see where you are coming from, but I'm kindly asking you to give some thought to where the wikiloops natives are coming from, too :)

HARK0D wrote:
Who would benefit from this? Probably none of the existing users on Wikiloops, but a new target audience instead.

I'm afraid that's why it might be hard to get a majority of wikiloops users to think of your idea as a good idea.

There simply needs to be some kind of incentive to allow anyone to exploit ones goldmine, if the deal is “I'll take your gold, and that will make me very happy”, then most people will have the odd impression that something is very, very wrong here.

Now, I do like science, I do like data research, and I am not at all afraid of Artificial Intelligence (AI).
Wikiloops has collaborated with a german university in the past and several master & bachelor papers were written around wikiloops development, coding and proper presentation.
If there was a somewhat reputable and obviously not profit interested school or university asking for access to wikiloops content for the research you are aiming to do, then that would feel a lot more like something the wikiloops community might be willing to support.
If anyone outside the education field would like to access our data, then that would indeed require some sort of financial compensation.
If -for example- the wikiloops users and music contributors were offered to use wikiloops for completely free, because some “nice” entity paid the servers and personnel to mine the data (a very common setup in todays “free” social media & YouTube), then that might be a deal some would accept.

I'm not nay-saying at all,
I do get your point,
your interest
and I do appreciate your voicing that.
Feel free to get in touch with me by message if you want to discuss this in more detail or suggest some sort of collaboration.
I will consider such options, but it has to be for the better for wikiloops and its members.

Harley Benton TE-52 NA Vintage Series

Electric Guitar

Read 1849 ratings

$145

iThis widget links to Thomann, our affiliate partner. We may receive a commission when you purchase a product there. Visit Shop

HARK0D Posted on Jun 14, 2020 #3

Member

Posts: 8

Joined: May 5, 2020

Hey, thanks for your message and I appreciate your comments "from the other side"! I completely understand where you are coming from and I agree with what you are saying. Such a solution should at the very least not have a negative impact on the Wikiloops community.

While I was writing this I also recognized the risk of abuse you have mentioned. You could perhaps look at what SigSep has done with the MUSDB18 dataset for inspiration. They have the MUSDB18 dataset online on Zenodo (https://zenodo.org/record/1117372#.XuaFSEX7SUk) with restricted access. This means that anyone who wishes to use (and therefore download) the MUSDB18 dataset is required to ask for permission and provide a (good) justification. They have also expressed strong limits on the type of use, namely: "The musdb is provided for educational purposes only and the material contained in them should not be used for any commercial purpose without the express permission of the copyright holders:"

Wikiloops can take a similar approach to SigSep where you could grant permission of use under restricted, non-commercial licenses. Such licenses could be monetized in a similar fashion as the "collaborator" and "supporting" memberships. Maybe making an even higher-tiered membership or a sort-of pay-per-use plan?

Howeverrrr.. while protecting yourself legally with such a licence sounds reasonable in theory, I honestly have no idea what the effect of such a licence would be in practice in terms of protection. What can you really do when someone abuses the licence? That is something I really cannot give a good answer for. On the other hand, the barrier set by requiring permission to be requested would (in my opinion) at least thwart most people that are not able to come up with a good reason.

Back to forum Back to top

wikiloops online jamsessions wird präsentiert mit freundlichem Support von:

Wikiloops is my practice, my inspiration and, most importantly, lends a friendly ear for all abilities to improve through collaboration. Thank you!

mpointon