Slack Graph Exporter

 

Deployment using AWS CloudFormation

After you've subscribed to the Amazon Marketplace product, click on this link to create a new CloudFormation stack.

Currently only the us-east-1 availability zone is supported.

Stack inputs

NeptuneDBInstanceClass
Neptune DB instance class (e.g. db.t3.medium)
RDFBaseURI
Base URI for the RDF data (has to end with /)
SlackAuthToken
Slack user token (starts with xoxp)
The token has to have the following permission scopes:
identify
bot
channels:history
groups:history
channels:read
groups:read
team:read
users:read

Stack outputs

NeptuneSPARQLEndpointURL
URL of the Neptune SPARQL endpoint
RDFOutputBucket
S3 bucket where RDF data will be written

Data export

After the CloudFormation stack is deployed, follow these steps:

  1. Go to ECS → Cluster → SlackGraphExporter-FargateClusterXXXXXX → Tasks
  2. Click the button
  3. Under Deployment configuration, select SlackGraphExporterSlackTaskRunnerYYYYYY as the Family
  4. Under Networking, select
    1. VPC named SlackGraphExporter/OctopusVPC
    2. Private subnet(s)
    3. The default VPC security group
  5. Click to create the task

Resource names depend on your stack name; we have used SlackGraphExporter as an example.

The export is finished when all of the SlackExporter tasks are completed.

Access Neptune from EC2

Configure resources

Based on Connecting to a Neptune DB Cluster from an Amazon EC2 instance in the same VPC.

  1. Make sure EC2 instance's network settings are correct
    • VPC is SlackGraphExporter/OctopusVPC
    • Subnet is a public subnet such as SlackGraphExporter/OctopusVPC/PublicSubnet1
  2. Change NeptuneSecurityGroup's inbound rule on port 8182 to use EC2 instances security group as its Source
  3. Create an IAM role for a Neptune client on EC2 that
    • has AWS-managed NeptuneFullAccess policy or better yet, a custom policy
      {
      	"Version": "2012-10-17",
      	"Statement": [
      		{
      			 "Effect": "Allow",
      			 "Action": [
      				 "neptune-db:*"
      			 ],
      			 "Resource": [
      				 "*"
      			 ]
      		}
      	]
      }
    • has a Trust relationship that allows to assume the IAM user that has deployed the CloudFormation stack

Query SPARQL endpoint

The following instructions were tested on a Ubuntu 22.04 EC2 instance.

  1. Install or update the latest version of the AWS CLI
  2. Configure AWS credentials
    sudo apt-get update && \
    sudo apt-get install unzip && \
    curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && \
    unzip awscliv2.zip && \
    sudo ./aws/install && \
    aws configure
  3. Install awscurl
    sudo apt-get update && \
    sudo apt-get install python3-pip jq && \
    pip3 install awscurl && \
    export PATH="$PATH:/home/ubuntu/.local/bin"
  4. Execute the ec2_neptune_query.sh script which will
    • read the query string from stdin
    • use the 1st argument as the Neptune SPARQL endpoint
    • use the 2nd argument as the ARN of Neptune client's role
    • use the 3rd argument as Neptune's region

For example:

echo "ASK {}" | ./ec2_neptune_query.sh https://neptuneinstance-9yayusfky7oj.cnol6sn9sq5j.us-east-1.neptune.amazonaws.com:8182/sparql arn:aws:iam::580601482069:role/NeptuneClient us-east-1

RDF output sample

<https://localhost/conversations/1614287609.001200> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdfs.org/sioc/ns#Post> <https://localhost/conversations/1614287609.001200> .
<https://localhost/conversations/1614287609.001200> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Document> <https://localhost/conversations/1614287609.001200> .
<https://localhost/conversations/1614287609.001200> <http://rdfs.org/sioc/ns#content> "<https://www.inc.com/christine-lagorio/jack-dangermond-esri-what-i-know-podcast.html>" <https://localhost/conversations/1614287609.001200> .
<https://localhost/conversations/1614287609.001200> <http://rdfs.org/sioc/ns#has_creator> <https://localhost/users/UBM1WAVD1> <https://localhost/conversations/1614287609.001200> .
<https://localhost/conversations/1614287609.001200> <http://rdfs.org/sioc/ns#id> "1614287609.001200" <https://localhost/conversations/1614287609.001200> .
<https://localhost/conversations/1614287609.001200> <http://purl.org/dc/terms/created> "2021-02-25T21:13:29"^^<http://www.w3.org/2001/XMLSchema#dateTime> <https://localhost/conversations/1614287609.001200> .
<https://localhost/conversations/1614287609.001200> <http://rdfs.org/sioc/ns#reply_of> <https://localhost/conversations/1614287327.000500> <https://localhost/conversations/1614287609.001200> .
<https://localhost/conversations/1614287609.001200> <http://rdfs.org/sioc/ns#content> "<https://www.esri.com/about/newsroom/blog/how-researchers-built-johns-hopkins-dashboard/>" <https://localhost/conversations/1614287609.001200> .
<https://localhost/conversations/1614287609.001200> <http://purl.org/dc/terms/title> "COVID-19: Inside Look at the Johns Hopkins Dashboard, Keeping Tabs on the Virus" <https://localhost/conversations/1614287609.001200> .
<https://localhost/conversations/1614287609.001200> <http://rdfs.org/sioc/ns#links_to> <https://www.esri.com/about/newsroom/blog/how-researchers-built-johns-hopkins-dashboard/> <https://localhost/conversations/1614287609.001200> .
<https://localhost/conversations/1614287609.001200> <http://rdfs.org/sioc/ns#content> "Ensheng Dong, the architect of the Johns Hopkins COVID-19 dashboard, applied his knowledge of GIS to map and track the spread of the disease." <https://localhost/conversations/1614287609.001200> .