University of South Alabama - Computer Science
For Sale:Grazing Pasture
Under Water
In the spirit of the legend of Ernest Hemingway’s six word short story
this project places terse and poignant narratives of land use and loss
restoration and adaptation
attachment to community
resilience
and personal identity along the Chenier Plain of coastal Louisiana. These six word narratives have been culled from transcripts of interviews collected over the past year and a half from the Chenier community of Pecan Island. The stories are stored as points on a web-based map; with the stories appearing and disappearing on the map as the user moves the mouse over the points. Moreover
the system allows users to anonymously add their own narratives onto the map. Ultimately
over time
the system with the participation of residents and former residents
will provide a rich
vibrate view of the impact of the marsh degradation upon South Louisiana.\n\nA demonstration of the system
designed and developed by UL Lafayette Coastal Community Resilience Studio and CVDI
was shown on Friday
March 14
as part of the Artech Fusion . A short presentation has also given by Kari Smith
JoAnne DeRouen
and Ryan Benton.
Shahid Virani
Kati Smith
Detect Onset of Event Using Twitter Data
The goal of the project was to detect the onset of emerging events using social media
specifically Twitter streams. As part of the effort
the team
composed of UL Lafayette and Drexel personnel
developed a new method to detect the onset of events
explored topic evolution and developed some tweet visualization methods. A tool
which implement the Event Detection on Onset (EDO) method
allowed users to input a set of search terms
and output
in text and graphical formats
the discovered events in real-time.
Rare Rule Association Rule Mining - RP-Growth
Have begun exploring rare rule association mining
which deals with creating rules from \"rare\" associations; in this case
rare associations are items that occur together less than a user threshold. However
rare associations (and rules) can be important in some cases. For instance
a disease may occur rarely
but a set of symptoms may be a good indicator of the disease. This project is at the initial stages. At present
we have implemented an algorithm
entirled RP-Growth
described by \"Sidney Tsang
Yun Sing Koh
Gillian Dobbie
RP-Tree: Rare Pattern Tree Mining
International Conference of Data Warehousing and Knowledge Discovery
277-288 (2011)\".\nThe code can be found at: https://github.com/rgbenton/RPTree\nThe code has been incorporated into SPMF
a open-source data mining library: http://www.philippe-fournier-viger.com/spmf/
Software professional with over 14 years of post-educational experience in project management
research and development. Research experience with a primary focus on machine learning/data mining and secondary focus on image processing/machine vision. Experience in software development
project management
and lab management. Managed small staff and multiple funded research projects and contracts. Experience in designing and implementing proof-of-concept systems for a variety of applications.
Ryan
Benton
Global Science & Technology
Inc
University of South Alabama
University of Louisiana at Lafayette
Global Science & Technology
Inc
Star Software Systems Corporation
Mobile
Alabama
I teach courses in Software Engineering
Data Mining and Big Data. In addition I conduct research within the areas of data mining
machine learning and big data. As part of that effort
I am a co-lead of the Data Mining research group within the Data Science lab and am a member of the Digital Forensics Information Intelligence (DFII) research group within the Center for Forensics
Information Technology & Security.
Assistant Professor
University of South Alabama
Established and lead research program at STAR
with focus on adaptive retrieval technologies and automated equipment prognostics. Pursued and acquired Small Business Innovation Research (SBIR) and Small Business Technology Transfer (STTR) funding for research activities. Coordinated efforts between company and external university collaborators.
Star Software Systems Corporation
Global Science & Technology
Inc
Greenbelt
Maryland
Developed code to display and manipulate satellite imagery.
Intern
Greenbelt
Maryland
Developed code to display and manipulate satellite imagery.
Intern
Global Science & Technology
Inc
Conducted research
as member of several interdisciplinary projects
in applied machine learning and machine vision to solve domain specific problems. Led half of the projects as either project manager or technical lead. Contributed
as key member
in transfer efforts for industrial organizations. Architected and managed research laboratory that supports permanent research group. Established relationships with external businesses and internal organizational units for collaborative efforts. Acquired funding for research and technology transfer activities. Taught graduate-level independent project courses in machine learning and parallelization.
University of Louisiana at Lafayette
PhD
Computer Science
MS
Computer Science
BS
Computer Information Systems
C++
Machine Learning
XML
Computer Science
Text Mining
Algorithms
Programming
Python
Pattern Recognition
Image Processing
Software Engineering
R
Information Retrieval
Java
Natural Language Processing
Software Development
Data Mining
Artificial Intelligence
Scientific Computing
High Performance Computing
Composite Kernels for Automatic Relevance Determination in Computerized Diagnosis of Alzheimer's Disease
in International Conference on Brain and Health Informatics
Maebashi
Japan
pp. 126-137
October 29-31
2013.\n\nVoxel-based analysis of neuroimagery provides a promising source of information for early diagnosis of Alzheimer’s disease. However
neuroimaging procedures usually generate high-dimensional data. This complicates statistical analysis and modeling
resulting in high computational complexity and typically more complicated models. This study uses the features extracted from Positron Emission Tomography imagery by 3D Stereotactic Surface Projection. Using a taxonomy of features that complies with Talairach-Tourneau atlas
we investigate composite kernel functions for predictive modeling of Alzheimer’s disease. The composite kernels
compared with standard kernel functions (i.e. a simple Gaussian-shaped function)
better capture the characteristic patterns of the disease. As a result
we can automatically determine the anatomical regions of relevance for diagnosis. This improves the interpretability of models in terms of known neural correlates of the disease. Furthermore
the composite kernels significantly improve the discrimination of MCI from Normal
which is encouraging for early diagnosis.
Composite Kernels for Automatic Relevance Determination in Computerized Diagnosis of Alzheimer's Disease
David G. Clark
in the International Conference on Brain and Health Informatics
Maebashi
Japan
pp. 266-276
October 29-31
2013.\n\nClinical trials for interventions that seek to delay the onset of Alzheimer’s disease (AD) are hampered by inadequate methods for selecting study subjects who are at risk
and who may therefore benefit from the interventions being studied. Automated monitoring tools may facilitate clinical research and thereby reduce the impact of AD on individuals
caregivers
society at large
and government healthcare infrastructure. We studied the 18F-deoxyglucose positron emission tomography (FDG-PET) scans of research subjects from the Alzheimer’s Disease Neuroimaging Initiative (ADNI)
using a Machine Learning technique. Three hundred ninety-four FDG-PET scans were obtained from the ADNI database. An automated procedure was used to extract measurements from 31 regions of each PET surface projection. These data points were used to evaluate the sensitivity and specificity of support vector machine (SVM) classifiers and to compare both Linear and Radial-Basis SVM techniques against a classic thresholding method used in earlier work.
Diagnosis and Grading of Alzheimer's Disease via Automatic Classification of FDG-PET Scans
Social Media generates information about news and events in real-time. Given the vast amount of data available and the rate of information propagation
reliably identifying events is a challenge. Most state-of-the-art techniques are post hoc techniques that detect an event after it happened. Our goal is to detect onset of an event as it is happening using the usergenerated information from Twitter streams. To achieve this goal
we use a discriminative model to identify change in the pattern of conversations over time. We use a topic evolution model to find credible events and eliminate random noise that is prevalent in many of the event detection models. The simplicity of the proposed model allows detect events quickly and efficiently
permitting discovery of events within minutes from the start of conversation about those conversations on Twitter. Our model is evaluated on a large-scale Twitter corpus to detect events in realtime. The proposed model is tested on other datasets to detect change over longer periods of time. The results indicate we can detect real events
within 3 to 8 minutes of it first appearing
with a lower degree of noise compared to other methods.
Detection of Event Onset Using Twitter
in Hawaii International Conference on System Sciences
Kauai
Hawaii
pp. 2794-2803
January 5-8
2015.\n\nThis paper presents a flu monitoring system that utilizes prescriptions-based data. It provides evidence-base information that may be “useful” to many users
e.g.
medical professionals
public health administrators
patients
prescription drugs manufacturers
elementary/middle/high schools.
Real-Time Flu Monitoring System and Decision Informatics
in the book \"Granular Computing and Decision-Making Interactive and Iterative Approaches\"
pages 33-46. The editors of the book are Witold Pedrycz and Shyi-Ming Chen; the ISBN numbers are 978-3-319-16828-9 (Print) and 978-3-319-16829-6 (Online)\n\nThe book is Volume 10 of the Studies in Big Data Series.
A Comprehensive Granular Model for Decision Making with Complex Data
Murat Seckin Ayhan
IEEE
Conference paper appearing in \"IEEE International Conference on Bioinformatics and Biomedicine\" at Hong Kong
which took place from December 18
2010 to December 21
2010. Pages: 516-519.\n\nAlzheimer’s disease (AD) is one major cause of dementia. Previous studies have indicated that the use of features derived from Positron Emission Tomography (PET) scans lead to more accurate and earlier diagnosis of AD
compared to the traditional approach used for determining dementia ratings
which uses a combination of clinical assessments such as memory tests. In this study
we compare Naïve Bayes (NB)
a probabilistic learner
with variations of Support Vector Machines (SVMs)
a geometric learner
for the automatic diagnosis of Alzheimer’s disease. 3D Stereotactic Surface Projection (3D-SSP) is utilized to extract features from PET scans. At the most detailed level
the dimensionality of the feature space is very high
resulting in 15964 features. Since classifier performance can degrade in the presence of a high number of features
we evaluate the benefits of a correlationbased feature selection method to find a small number of highly relevant features.
Exploitation of 3D Stereotactic Surface Projection for Automated Classification of Alzheimer's Disease According to Dementia Levels
Positron Emission Tomography scans are a promising source of information for early diagnosis of Alzheimer's disease. However
such neuroimaging procedures usually generate high-dimensional data. This complicates statistical analysis and modeling
resulting in high computational complexity and typically more complicated models. However
the utilization of domain-knowledge can reduce the complexity and promote simpler models. In this study
we investigate Gaussian processes
which may incorporate domain-knowledge
for predictive modeling of Alzheimer's disease. This study uses features extracted from PET imagery by 3D Stereotactic Surface Projection. Since the number of features can be high even after applying prior knowledge
we examine the benefits of a correlation-based feature selection method. Feature selection is desirable as it enables the detection of metabolic abnormalities that only span certain portions of the anatomical regions. Our proposed utilization of Gaussian processes is superior to the alternative (Automatic Relevance Determination)
resulting in more accurate diagnosis with less computational effort.
Utilization of domain-knowledge for simplicity and comprehensibility in predictive modeling of Alzheimer's disease
Jennifer Lavergne
in 20th International Symposium on Methodologies for Intelligent Systems
Macau
pp. 61-70
December 4-7
2012.\n\nThe goal of association mining is to find potentially interesting rules in large repositories of data. Unfortunately using a minimum support threshold
a standard practice to improve the association mining processing complexity
can allow some of these rules to remain hidden. This occurs because not all rules which have high confidence have a high support count. Various methods have been proposed to find these low support rules
but the resulting increase in complexity can be prohibitively expensive. In this paper
we propose a novel targeted association mining approach to rare rule mining using the itemset tree data structure (aka TRARM-RelSup). This algorithm combines the efficiency of targeted association mining querying with the capabilities of rare rule mining; this results in discovering a more focused
standard and rare rules for the user
while keeping the complexity manageable.
TRARM-RelSup: Targeted Rare Association Rule Mining Using Itemset Trees and the Relative Support Measure
Alzheimer's disease (AD) is one major cause of dementia.\nPrevious studies have indicated that the use of features derived from\nPositron Emission Tomography (PET) scans lead to more accurate and\nearlier diagnosis of AD
compared to the traditional approaches that\nuse a combination of clinical assessments. In this study
we compare\nNaive Bayes (NB) with variations of Support Vector Machines (SVMs)\nfor the automatic diagnosis of AD. 3D Stereotactic Surface Projection\n(3D-SSP) is utilized to extract features from PET scans. At the most\ndetailed level
the dimensionality of the feature space is very high.\nHence we evaluate the benefits of a correlation-based feature selection\nmethod to ?find a small number of highly relevant features; we also\nprovide an analysis of selected features
which is generally supportive\nof the literature. However
we have also encountered patterns that may\nbe new and relevant to prediction of the progression of AD.
Exploitation of 3D Stereotactic Surface Projection for predictive modelling of Alzheimer's Disease
Gui-Liang Feng
Communications in Information Science and Management Engineering (CISME)
12 page journal paper appearing in \"Communications in Information Science and Management Engineering\"
Vol.2 No.12
pp.71-85
December 2012.\n\nIn this paper
we consider the content distribution problem over wireless mesh networks
which are characterized by the broadcast nature of the medium and significant data redundancy. One potential solution is network coding [6
7]
which has been recently been receiving interest [1-5] within the wireless community. This paper describes an efficient algebraic wireless network coding scheme
which utilizes special matrices to ensure linear independence of code vector. Unlike random coefficient network coding [1]
our scheme is able to provide fast coding for small number of packets usually required by real-time application. In addition
the proposed scheme obtains great improvement in computational speed by avoiding the use of Gaussian elimination when generating linear independent code vectors.
Fast Wireless Network Coding for Real-time Data
in International Conference on Rough Sets
Fuzzy Sets
Data Mining and Granular Computing
pp. 15-25
Halifax
Nova Scotia
Canada
October 11-14
2013.\n\nAnalyzing and classifying sequence data based on structural similarities and differences is a mathematical problem of escalating relevance. Indeed
a primary challenge in designing machine learning algorithms to analyzing sequence data is the extraction and representation of significant features. This paper introduces a generalized sequence feature extraction model
referred to as the Generalized Multi-Layered Vector Spaces (GMLVS) model. Unlike most models that represent sequence data based on subsequences frequency
the GMLVS model represents a given sequence as a collection of features
where each individual feature captures the spatial relationships between two subsequences and can be mapped into a feature vector. The utility of this approach is demonstrated via two special cases of the GMLVS model
namely
Lossless Decomposition (LD) and the Multi-Layered Vector Spaces (MLVS). Experimental evaluation show the GMLVS inspired models generated feature vectors that
combined with basic machine learning techniques
are able to achieve high classification performance.
Representations for Large-scale Sequence Data Mining: A Tale of Two Vector Space Models
Harika Karnati
Adverse drug events (ADEs) are among the leading causes of death in the United States. Although many ADEs are detected during pharmaceutical drug development and the FDA approval process
all of the possible reactions cannot be identified during this period. Currently
post-consumer drug surveillance relies on voluntary reporting systems
such as the FDA's Adverse Event Reporting System (AERS). With an increase in availability of medical resources and health related data online
interest in medical data mining has grown rapidly. This information coupled with online conversations of people which involve discussions about their health provide a substantial resource for the identification of ADEs. In this work
we propose a method to identify adverse drug effects from tweets by modeling it as a link classification problem in graphs. Drug and symptom mentions are extracted from the tweet history of each user and a drug-symptom graph is built
where nodes represent either drugs or symptoms and edges are labelled positive or negative
for desired or adverse drug effects respectively. A link classification model is then used to identify negative edges i.e. adverse drug effects. We test our model on 864 users using 10-fold cross validation with Sider's dataset as ground truth. Our model was able to achieve an F-Score of 0.77 compared to the best baseline model with an F-Score of 0.58.
Detecting Adverse Drug Effects Using Link Classification on Twitter Data
Gui-Liang Feng
IEEE
6 page conference paper appearing in \"IEEE International Conference on Electro/Information Technology\" at Mankato
Minnesota. The conference took place from May 15
2011 to May 17
2011.\n\nIn this paper
we consider the content distribution problem over wireless mesh networks
which are characterized by the broadcast nature of the medium and significant data redundancy. One potential solution is network coding
which has been recently been receiving interest within the wireless community. This paper describes an efficient algebraic wireless network coding scheme
which utilizes special matrices to ensure linear independence of code vector. Unlike random coefficient network coding
our scheme is able to provide fast coding without involving the use of costly Gaussian elimination.\n
A Class of Wireless Network Coding Schemes
Ashwin Kannalath
Samy Alihamad
in International Conference on Machine Learning and Applications
Miami
Florida
pp. 348-353
December 4-7
2013.\n\nThe task of learning action rules aims to provide recommendations to analysts seeking to achieve a specific change. An action rule is constructed as a series of changes
or actions
which can be made to a the flexible characteristics of a given object that ultimately triggers the desired change. Existing action rule discovery methods utilize generate-and-test approach in which candidate action rules are generated and those that satisfy the user-defined thresholds are returned. A shortcoming of this operational model is these is no guarantee all objects are covered by the generated action rules. In this paper
we define a new methodology refereed to as Targeted Action Rule Discovery (TARD). This methodology represents an object driven approach in which an action rule is explicitly discovered per target object. A TARD method is proposed that effectively discovers actions rules through the iterative construction of multiple decision trees. Experiments show the proposed method is able to provide higher quality rules than the well-known Association Action Rule (AAR) method.
Targeted Action Rule Discovery
Can Akkoç
Multi-layered Vector Spaces for Classifying and Analyzing Biological Sequences
Ming Tan
Lewis Pannell
Ethan Butler
Leanna Fain
Laura Fain
Analyzing and classifying sequences based on similarities and differences is a mathematical problem of escalating relevance and importance in many scientific disciplines. One of the primary challenges in applying machine learning algorithms to sequential data
such as biological sequences
is the extraction and representation of significant features from the data. To address this problem
we have recently developed a representation
entitled Multi-Layered Vector Spaces (MLVS)
which is a simple mathematical model that maps sequences into a set of MLVS. We demonstrate the usefulness of the model by applying it to the problem of identifying signal peptides. MLVS feature vectors are generated from a collection of protein sequences and the resulting vectors are used to create support vector machine classifiers. Experiments show that the MLVS-based classifiers are able to outperform or perform on par with several existing methods that are specifically designed for the purpose of identifying signal peptides.
Exploiting Multi-Layered Vector Spaces for Signal Peptide Detection
George Grispos
Maureen S. Van Devender
Understanding De-identification of Healthcare Big Data
Jennifer Lavergne
IEEE
in IEEE/WIC/ACM International Conference on Web Intelligence
Atlanta
GA
pp. 298-306
October 29-31
2013.\n\nRecently
with companies and government agencies saving large repositories of time stream/temporal data
there is a large push for adapting association rule mining algorithms for dynamic
targeted querying. In addition
issues with data processing latency and results depreciating in value with the passage of time
create a need for swifter and more efficient processing. The aim of targeted association mining is to find potentially interesting implications in large repositories of data. Using targeted association mining techniques
specific implications that contain items of user interest can be found faster and before the implications have depreciated in value beyond usefulness.\nIn this paper
the DynTARM algorithm is proposed for the discovery of targeted and rare association rules. DynTARM has the flexibility to discover strong and rare association rules from data streams within the user’s sphere of interest. By introducing a measure
called the Volatility Index
to assess the fluctuation in the confidence of rules
rules conforming to different temporal patterns are discovered.
DynTARM: An In-Memory Data Structure for Targeted Strong and Rare Association Rule Mining Over Time-Varying Domains
in 2011 IEEE 11th International Conference on Data Mining Workshops (ICDMW)
pp. 398 - 404
December 11
2011.\n\nAction rules mining aims to provide recommendations to analysts seeking to achieve a specific change. An action rule is constructed as a series of changes
or actions
which can be made to some of the flexible characteristics of the information system that ultimately triggers a change in the targeted attribute. The existing action rules discovery methods consider the input decision system as their search domain and are limited to expensive and ambiguous strategies. In this paper
we define and propose the notion of action table as the ideal search domain for actions
and then propose a strategy based on the FP-Tree structure to achieve high performance in rules extraction.
FAARM: Frequent Association Action Rules Mining Using FP-Tree
Alaaeldin Hafez
Jennifer Lavergne
Springer
in 20th International Symposium on Methodologies for Intelligent Systems
Macau
pp 51-60
December 4-7
2012.\n\nThe itemset tree data structure is used in targeted association mining to find rules within a user's sphere of interest. In this paper
we propose two enhancements to the original unordered itemset trees. The first enhancement consists of sorting all nodes in lexical order based upon the itemsets they contain. In the second enhancement
called the Min-Max Itemset Tree
each node was augmented with minimum and maximum values that represent the range of itemsets contained in the children below. For demonstration purposes
we provide a comprehensive evaluation of the effects of the enhancements on the itemset tree querying process by performing experiments on sparse
dense
and categorical datasets.
Min-Max Itemset Trees for Dense and Categorical Datasets
Develop analytical tools and solutions for data cleansing
message annotation
classification and clustering
topic evolution
and recommendation for social media users.\nExtend the emerging event detection model to incorporate multiple social media sources and additional meta‐data.\nExtend the emerging event detection to identify and detect emerging subevents.
The following profiles may or may not be the same professor: