Skip to end of metadata
Go to start of metadata
Meeting about PipelinePilot

------------------------------------------------------------------------------------------
TIME: FRIDAY 24th Sept, 9:30
LOCATION: Symyx Technologies Europe SA (the sign may not have been changed to Accelrys yet)
Steinentorberg 12
4010 Basel
Switzerland
Phone: +41 61-486-8888
Directions
------------------------------------------------------------------------------------------
MINUTES

pipeline pilot introduction and demo, Basel 24th Sept, organised by Jean Mercier of Accelrys, 9:30 start
by ela.hunt@systemsx.ch
attending: Jonas, Lucas, Hubert, Ela, Tomek, Manuel, Hans-Rudolf, Vincent

Intro - Jean
20/24 large biopharma companies use PP
accelrys - start 2001, rev 160 M dollars, customers 13500, 580 employees, SanDiego HQ
software and services
goals: improve innovation, optimise proceses and improve productivity
areas: nano design, material sciences initiatives, biological registration SIG, scentific research services
better scientific workflow, all types of science data, tracking of information for reuse

technical part by Eddy Vande Water
PP, based on data pipelining

  • text analytics collection (NCBI, Google, etc) to add external data
  • workflow building interface, users use web interface

SEQUENCE ANALYSIS

  • 100 functions, Readers for most common formats, incl ClustalW, Fasta, PDB, Emboss modules are integrated, also Artemis,
  • can run on the web interface, or workflow interface
  • chemical info lookup, from the chemical collection
  • similarity search, via BLAST, (also used as ETL), create new databases, for instance for BLAST
  • based on BioPerl model
  • input file format (columns) is shown in a wizard, this can be used to select the input that is needed for the pipeline
  • can accumulate BLAST hits into one, then create a pdf file with hits, see map and alignments
  • versions of BLAST and bioBerl may not be up to date, need to include new code oneself, also remote BLAST is possible, can replace with new versions of string searching software
  • BioPerl is up to date

Q - users authentication, access rights, integration ? - more details later

Java script SDK, a web application for Pipeline Pilot Web port, protocols can be run from the web.
deployment of a workflow on the web browser - as web service, quick, can select parameters to expose to the user

Gene Expression Collection

  • BioConductor (R), complex workflows with a single line of code
  • expert creates workflow, user can use the embedded R from that workflow
    data readers - affy, illumina, GEO, excel , extract subsets of experiment, annotate Affy probe, filter on stats tests or other conditions, Java heatmap view (treeview), query Entrez on the web and using text analytics add annotations from external DB,

Integration Collection
jdbc and odbc drivers for DB access
DB access (connector, username, password, optional, enclrypted, or entered online, or OS authentication, custommised admin level for the application (replica of what a db offers but saved in files, so no DB is needed to run the system)
cannot edit workflow through the web
PP user name can define the workflow that can be run (PP credentials)
server: linux redhad and suse, windows
can be integrated with clusters and grids, install PP on cluster nodes, to parallelise
clusters: PBS, LSF, SunGE

Q - large data sets?
1. PP can be called externally on a cluster, PP needs to decide where to keep the files
2. also can deploy PP on the grid engine
3. parallel processing options are available from PP

Q - scalability, users?
standard setup for server - single processor, 3-4 experts, 10-12 users
also used by Novartis for enterprise computing, takes advantage of many cores on a multicore machine (is this true or just wishful thinking is my question)

Q - parallel, queues?
you can set no of jobs per queue

Q - history?
yes - come back next day, see web interface or application and see application progress, Java Script SDK uses web services to look up status, the web app refers to jobID, file based - a job desc is in a file, on the server, can store the file into DB to keep track of what happened, some companies do that
to get a more complex interface - front-end, Java Script some programming is needed

Q - Matlab and python?
Matlab available, python only on windows server, can be integrated by ssh, depends where it is to run, problems with windows
command line parameters are passed on on the fly to a script
API - Java, perl , python, web services (can batch SOAP requests), can use predefined WSDL formats via a url

MS collection
readers, analysis tools, large files, pipelining for efficiency

PROTEOMICS - very active, fast developing, forum is active
Hubert - how to implement new components, what code to write to integrate a new component
using PP script language plus Java or some other language

Q - speed of change, can ACCELRYS react fast to new needs?
PP posts new developments fast but quality may not be in the release immediately, release takes longer for robustness purposes, and support needs to be agreed

Q - how much effort in admin?
once a year a big release, new release every 3 months, but may not want to install
new install: admin view, server setup view, credentials, env variables, security, can use file with security schema, can use local server security schema, or domain one, can define roles, protocols can be run at command line as well, client access definitions, groups within PP, job settings for parallelisation (how many jobs in parallel), can limit the subprotocol jobs to a certain number, subprotocols do not spawn new processes but inherit parent process number (if I got it right)

Screening and Deep Sequencing
screening - plate analytics and imaging, stats, reporting
deep seq - BWA, Illumina readers
images - can link to other collections, data modelling and stats, learning, predictive models, learning collection for images,
about to start with Roche, UNi Amsterdam, for in vivo data for image review and analysis
segmetation, nuclei identification, (ex. measure neuron width for MS or similar ), average for each image, thresholding, calculate number of nuclei and vesicles in each image.

Text Analytics collection
external queries to DBs, web services, internal servers, extract key concepts, find corellations, link to bio data

Reporting
reports on the web, visualisation, data mining, visual programming of HTML reports or pdfs, could be word or excel doc

Q - versioning for workflows
yes, like in VMS, versions can be made public as needed

External invocation
from command line, as web service

Internal structure
no DB, Files, saving global and local params, but does not manage in/out data which are managed in the file system

Actions
EH and others to disseminate info to other interested parties
Jean - organise a special session with focus on HCS

------------------------------------------------------------------------------------------
meeting organised by Jean Mercier of Accelrys and Ela Hunt of SyBIT

How to get there:
It is relatively easy to find when you exit the station go straight ahead keeping the trams central stops on your right, blue front hotel making the corner on your left, then follow the road for 100m, there is a little underpass take it and continue follow the passage you go under a road by doing so (~50m) and there is quite a lot of roadwork at present there so you need to navigate a bit, and then you see a long dark grey building on your left, this is the first entrance (a good 30m before the swimming pool/fitness center and a little deli).
We're on the 6th floor. (there is a symyx sign outside so you should be able to spot the entrance easily.
Any pb give me a call on my mobile. I look forward to meeting you there.
Tel (home office): +44(0)1489795375
Tel (voiceIP): +44 (0)1223228597

  • No labels