Versioner sammenlignet

Nøgle

  • Linjen blev tilføjet.
  • Denne linje blev fjernet.
  • Formatering blev ændret.

I stumpled upon the INSERT INTO LOGSTASH SELECT DATA FROM DATABASE article and decided to play around.

Installing PostgreSQL and loading the database and the data into ELK, see Loading Musicbrainz in Elasticsearch

Refer to the https://musicbrainz.org/doc/MusicBrainz_Database for moreAfterwards, the same data should splunkes (smil)the Original setup

 

First part - Install and setup "Splunk DB Connect"

The load gave this in Kibana:

And verifying the row count in PostgreSQL with:

Kodeblok
SELECT count(*) AS Dummy FROM (
SELECT
    release_group.gid AS album_id,
    release_group.type AS album_primary_type_id,
    release_group_primary_type.name AS album_primary_type_name,
    release.name AS release_name,
    artist.name AS artist_name,
    artist.gid AS artist_gid,
    artist_credit.id AS artist_credit_id,
    artist.type AS artist_type_id,
    artist_type.name AS artist_type_name,
    artist.begin_date_year artist_begin_date_year,
    area.name AS artist_country_name,
    release_country.date_year AS release_year,
    release_country.date_month AS release_month,
    release_country.date_day AS release_day
FROM
    musicbrainz.artist
INNER JOIN musicbrainz.artist_credit_name
    ON artist_credit_name.artist = artist.id
INNER JOIN musicbrainz.artist_credit
    ON artist_credit.id = artist_credit_name.artist_credit
INNER JOIN musicbrainz.release_group
    ON release_group.artist_credit = artist_credit.id
INNER JOIN musicbrainz.release
    ON release.release_group = release_group.id
INNER JOIN musicbrainz.release_country
    ON release.id = release_country.release
INNER JOIN musicbrainz.artist_type
    ON artist.type = artist_type.id
INNER JOIN musicbrainz.area
    ON artist.area = area.id
INNER JOIN musicbrainz.release_group_primary_type
    ON release_group_primary_type.id = release_group.type
WHERE
    ((release_country.date_year IS NOT NULL) AND
    (release_country.date_month IS NOT NULL) AND
    (release_country.date_day IS NOT NULL))
    ) As Dummy2

Gave:

Kodeblok
 dummy
--------
 622527
(1 row)

Success - same row count (smil)

I do notice that some rows seems to be the same:

Image Removed

Or not? The "album_id" is the same, but in one row the "release_year" differs from the two others....

Running the SQL

Afterwards, make sure JAVA is configed:

Image Added

add the Postgres JDBC driver and check the Config:

Image Added

Now, the core part is done.

 

Then setup Identities and Connection - this is pretty basic

Advarsel

Remember to make sure the Databases You are connecting to are listening on 0.0.0.0 and NOT 127.0.0.1 - The 127.0.0.1 is typically default for Postgres and MySQL

 

Image Added

Reffering to the previous ELK setup, we use the same query:

Kodeblok
SELECT DISTINCT
Kodeblok
SELECT Distinct * FROM (
SELECT
    release_group.gid AS album_id,
    release_group.type AS album_primary_type_id,
    release_group_primary_type.name AS album_primary_type_name,
    release.name AS release_name,
    artist.name AS artist_name,
    artist.gid AS artist_gid,
    artist_credit.id AS artist_credit_id,
    artist.type AS artist_type_id,
    artist_type.name AS artist_type_name,
    artist.begin_date_year artist_begin_date_year,
    area.name AS artist_country_name,
    release_country.date_year AS release_year,
    release_country.date_month AS release_month,
    release_country.date_day AS release_day
FROM
    musicbrainz.artist
INNER JOIN musicbrainz.artist_credit_name
    ON artist_credit_name.artist = artist.id
INNER JOIN musicbrainz.artist_credit
    ON artist_credit.id = artist_credit_name.artist_credit
INNER JOIN musicbrainz.release_group
    ON release_group.artist_credit = artist_credit.id
INNER JOIN musicbrainz.release
    ON release.release_group = release_group.id
INNER JOIN musicbrainz.release_country
    ON release.id = release_country.release
INNER JOIN musicbrainz.artist_type
    ON artist.type = artist_type.id
INNER JOIN musicbrainz.area
    ON artist.area = area.id
INNER JOIN musicbrainz.release_group_primary_type
    ON release_group_primary_type.id = release_group.type
WHERE
    ((release_country.date_year IS NOT NULL) AND
    (release_country.date_month IS NOT NULL) AND
    (release_country.date_day IS NOT NULL))
    ) As Dummy2

Gave

Kodeblok
560155 rows

So, there is a possible redundancy in the SQL provided from https://www.elastic.co/blog/logstash-jdbc-input-plugin

Reloading the data (after deleting the Index) gives:

Image Removed

Samples

Here is a sample of all albums from "Denmark" with "Peter" in the Artist name:

Image Removed

A few Visualizations..

Image Removed

Whats Next..

Well, this is unfinished business...there are so much more data to combine......

And another project could be parsing IMDB data ... http://www.imdb.com/interfaces

...

To get a preview:

Image Added

 

Where Logstash only gave us the possibility to "bulk" upload the Query result to Elasticsearch - "Splunk DB Connect" gives us 3 options:

Tip

This is where Splunk in my Opinion "runs over" the ELK stack - There are more GUI and both inputs (as Logstash) and on-the-fly lookup (smil)

DB Inputs

DB Inputs are "equal" to the Logstash approach, load the Query result into Splunk as "log lines":

Image Added  Image Added  Image Added  Image Added

Gives the loglines:

Image Added

DB Outputs

DB Lookups